pig-user  

Re: Using external jar in UDF

Dmitriy Ryaboy
Mon, 15 Mar 2010 14:08:56 -0700

Your UDF will reference the classes the regular way - just use imports. The
trick is to make sure the jars are on the machine & classpath. Two ways to
do this -- pre-load them on the cluster and have them configured to be on
the default classpath, or use Pig's "REGISTER" keyword to register both your
UDF jar and the dependencies (once per each jar).  What Alan is saying,
there is no way to create a udf that would somehow tell pig that it needs to
package up and send a jar file located somewhere on the client machine --
you have to do that in the pig script yourself.

Additionally, thanks to Thejas, you can register jars on the command line if
you are on Pig 0.7 (trunk): https://issues.apache.org/jira/browse/PIG-1226


On Mon, Mar 15, 2010 at 1:52 PM, Corbin Hoenes <cor...@tynt.com> wrote:

> Okay what do you mean by "package and send along"?  What is the pig way to
> include additional jars?  e.g. we want to use a 3rd party library to encode
> json and how can our UDF reference that jar?
>
> On Mar 15, 2010, at 12:49 PM, Alan Gates wrote:
>
> > The UDF interface does not currently include the ability for a UDF to
> indicate additional jars it would like to have packaged and sent along.
> >
> > Alan.
> >
> > On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:
> >
> >> Hi,
> >>
> >> Register is working fine but it means that the user needs to know when
> it's
> >> needed to register the additional jar. What about my question regarding
> the
> >> M/R way of doing this ?
> >>
> >> Thanks,
> >> Tamir
> >>
> >> On Wed, Mar 10, 2010 at 11:21 AM, Jeff Zhang <zjf...@gmail.com> wrote:
> >>
> >>> Using *REGISTER myfunc.jar;*
> >>>
> >>> refer here:
> >>>
> http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#REGISTER
> >>>
> >>>
> >>> On Wed, Mar 10, 2010 at 4:52 PM, Tamir Kamara <tamirkam...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a function (eval) that needs to use an external jar.
> >>>> In M/R world this can be accomplished by uploading the jar to the dfs
> and
> >>>> using DistributedCache.addFileToClassPath.
> >>>> How do I do the same (have the jar available for the udf) in pig?
> >>>>
> >>>> Thanks,
> >>>> Tamir
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> Jeff Zhang
> >>>
> >
>
>