pig-user  

Re: GeoIP UDF

Kevin Weil
Thu, 18 Mar 2010 21:22:38 -0700

Your UDF is getting excecuted on an arbitrary datanode, and the java process
is trying to load the *local* file ./GeoIP.dat.  You could use
FileSystem.open to get an inputstream to the HDFS version you have, but then
all datanodes will be trying to access that one (or three with replication)
file, which may not be efficient.  The way we handle this is to have our
automated deploy/machine setup put GeoIP.dat in a specified location on all
datanodes.  That is, don't put it in HDFS, put it in a specified location on
the local filesystem, and then your code will work.

Kevin

On Thu, Mar 18, 2010 at 11:58 AM, Johannes Rußek <
johannes.rus...@io-consulting.net> wrote:

> Hello Everybody,
> i've written a wrapper class for the GeoIP api, but now i'm trying to
> access the GeoIP.dat file which i've added to hdfs via hadoop dfs -put
> GeoIP.dat GeoIP.dat and added to the cache in pig.properties via
> mapred.cache.files=hdfs://localhost:8020/user/root/GeoIP.dat
> however, it seems the geoip api is unable to open the file with
> './GeoIP.dat' as path. What should i use for this?
> Regards,
> Johannes
>