pig-user  

Re: GeoIP UDF

Dmitriy Ryaboy
Wed, 17 Mar 2010 10:47:09 -0700

The recipe on the cloudera blog works. It's made a little extra complex for
educational purposes -- you should just put the geoip db on all the nodes so
that you don't have to package it up and ship every time.

To be more efficient, you need to wrap the java api in a UDF, and preload
the geoip database into memory (make it a static service, and you can take
advantage of JVM reuse, too!).

-D

On Wed, Mar 17, 2010 at 10:34 AM, Johannes Rußek <
johannes.rus...@io-consulting.net> wrote:

> Hello Everybody,
> i've been searching the web trying to find a nice to way to use GeoIP from
> pig / hadoop.
> So far the only things i've been able to find is a perlscript to use with
> streaming and some suggestions to use the maxmind geoip java api from within
> hadoop (by packaging the geoip.dat file in the jar). I'd like to avoid
> streaming though.
> question is, how can i access that from within pig? do i have to write a
> udf "wrapper" to use it? or can i use any function that is available in
> hadoop from pig?
> Thanks,
> Johannes
>