Hmmm... I am not trying to serialize or deserialize custom content, but
simply take an input String (Text) run some Java  and return a new (Text) by
calling a function

Looking at public class UDFYear extends UDF { the annotation at the top
suggests extending UDF and adding the annotation, might be enough.

I'll try it anyways...
Tim

On Tue, Apr 27, 2010 at 7:37 PM, Adam O'Donnell <[email protected]> wrote:

> It sounds like what you want is a custom SerDe.  I have tried to write
> one but ran into some difficulty.
>
> On Tue, Apr 27, 2010 at 10:13 AM, Tim Robertson
> <[email protected]> wrote:
> > Thanks Edward,
> > You are indeed correct - I am confused!
> > So I checked out the source, and poked around.  If I were to extend UDF
> and
> > implement  public Text evaluate(Text source) {
> > would I be heading along the correct lines to use what you say above?
> > Thanks,
> > Tim
> >
> >
> > On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo <[email protected]>
> > wrote:
> >>
> >>
> >> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson
> >> <[email protected]> wrote:
> >>>
> >>> Hi,
> >>> I currently run a MapReduce job to rewrite a tab delimited file, and
> then
> >>> I use Hive for everything after that stage.
> >>> Am I correct in thinking that I can create a Jar with my own method
> which
> >>> can then be called in SQL?
> >>> Would the syntax be:
> >>>   hive> ADD JAR /tmp/parse.jar;
> >>>   hive> INSERT OVERWRITE TABLE target SELECT s.id,
> >>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as
> >>> parsedName;
> >>> and parse be a MR job?  If so what are the input and output formats
> >>> please for the parse?  Or is it a class implementing an interface
> perhaps
> >>> and Hive take care of the rest?
> >>> Thanks for any pointers,
> >>> Tim
> >>>
> >>
> >> Tim,
> >>
> >> A UDF is an sql function like toString() max()
> >> An InputFormat teachers hive to read data from Key Value files
> >> A serde tells Hive how to parse input data into columns.
> >> Finally, the map()reduce(), transform() keywords you described is a way
> to
> >> pipe data to external process and read the results back in. Almost like
> a
> >> non-native to hive UDF.
> >>
> >> So you have munged up 4 concepts together :) Do not feel bad however, I
> >> struggled though an input format for the last month.
> >>
> >> It sounds most like you want a udf that takes a string and returns a
> >> canonical representation.
> >>
> >>
> >>   hive> ADD JAR /tmp/parse.jar;
> >> create temporary function canonical as 'my.package.canonical';
> >> select canonical(my colum) from source;
> >>
> >> Regards,
> >>
> >>
> >>
> >
> >
>
>
>
> --
> Adam J. O'Donnell, Ph.D.
> Immunet Corporation
> Cell: +1 (267) 251-0070
>

Reply via email to