Thank you John,

I have checked the SHIP function, but It's a bit confusing to me. all the 
examples I found were related to streaming-through. 
And I have figured out a solution by putting the file on hdfs, and specifying 
the following options when executing pig: 

pig -Dmapred.cache.files=hdfs://host:port/path/to/file#link_name 
-Dmapred.create.symlink=yes some.pig

then I can just use `f = open('link_name')` in my jython UDF. since the file is 
small, and loaded only once, it works good so far.

On Dec 9, 2012, at 2:18 PM, John Gordon <john.gor...@microsoft.com> wrote:

> If you ship the file explicitly, you can use this syntax from there.  It will 
> pack it with the job jar and make sure it is in the working directory 
> wherever the job runs.  Be careful of shipping very large files, it is 
> probably better to refactor your logic into multiple tiplevel pig statements 
> on data loaded from hdfs if you find yourself shipping fixed, very large 
> files.
> ________________________________
> From: Young Ng
> Sent: 12/9/2012 12:53 PM
> To: user@pig.apache.org
> Subject: How can I load external files within jython UDF?
> 
> Hi,
> 
> I am trying to load some external resources within my jython udf functions, 
> e.g:
> 
> @outputSchema(....)
> def test():
>    f = open('test.txt.')
>    text = f.read()
>    f.close()
>    return text
> 
> I have place the 'test.txt' on both working folder and hdfs, and I got the 
> following error:
>   IOError: (2, 'No such file or directory', 'test.txt')
> 
> I have also try to print out the working path of jython, with os.getcwd(), 
> below is what I got:
>  
> /home/hduser/tmp/mapred/local/taskTracker/hduser/jobcache/job_201212080111_0007/attempt_201212080111_0007_m_000000_0/work
>  ....
> 
> I suspect that I can use absolute path within udf, but how can I transfer the 
> external resources to
> other hadoop datanodes?
> 
> 
> Thanks,
> Young Wu
> 
> 
> 
> 
> 

Reply via email to