Thank you John, I have checked the SHIP function, but It's a bit confusing to me. all the examples I found were related to streaming-through. And I have figured out a solution by putting the file on hdfs, and specifying the following options when executing pig:
pig -Dmapred.cache.files=hdfs://host:port/path/to/file#link_name -Dmapred.create.symlink=yes some.pig then I can just use `f = open('link_name')` in my jython UDF. since the file is small, and loaded only once, it works good so far. On Dec 9, 2012, at 2:18 PM, John Gordon <john.gor...@microsoft.com> wrote: > If you ship the file explicitly, you can use this syntax from there. It will > pack it with the job jar and make sure it is in the working directory > wherever the job runs. Be careful of shipping very large files, it is > probably better to refactor your logic into multiple tiplevel pig statements > on data loaded from hdfs if you find yourself shipping fixed, very large > files. > ________________________________ > From: Young Ng > Sent: 12/9/2012 12:53 PM > To: user@pig.apache.org > Subject: How can I load external files within jython UDF? > > Hi, > > I am trying to load some external resources within my jython udf functions, > e.g: > > @outputSchema(....) > def test(): > f = open('test.txt.') > text = f.read() > f.close() > return text > > I have place the 'test.txt' on both working folder and hdfs, and I got the > following error: > IOError: (2, 'No such file or directory', 'test.txt') > > I have also try to print out the working path of jython, with os.getcwd(), > below is what I got: > > /home/hduser/tmp/mapred/local/taskTracker/hduser/jobcache/job_201212080111_0007/attempt_201212080111_0007_m_000000_0/work > .... > > I suspect that I can use absolute path within udf, but how can I transfer the > external resources to > other hadoop datanodes? > > > Thanks, > Young Wu > > > > >