On Tue, Apr 27, 2010 at 1:48 PM, Tim Robertson <[email protected]>wrote:
> Hmmm... I am not trying to serialize or deserialize custom content, but
> simply take an input String (Text) run some Java and return a new (Text) by
> calling a function
>
> Looking at public class UDFYear extends UDF { the annotation at the top
> suggests extending UDF and adding the annotation, might be enough.
>
> I'll try it anyways...
> Tim
>
> On Tue, Apr 27, 2010 at 7:37 PM, Adam O'Donnell <[email protected]> wrote:
>
>> It sounds like what you want is a custom SerDe. I have tried to write
>> one but ran into some difficulty.
>>
>> On Tue, Apr 27, 2010 at 10:13 AM, Tim Robertson
>> <[email protected]> wrote:
>> > Thanks Edward,
>> > You are indeed correct - I am confused!
>> > So I checked out the source, and poked around. If I were to extend UDF
>> and
>> > implement public Text evaluate(Text source) {
>> > would I be heading along the correct lines to use what you say above?
>> > Thanks,
>> > Tim
>> >
>> >
>> > On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo <[email protected]
>> >
>> > wrote:
>> >>
>> >>
>> >> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson
>> >> <[email protected]> wrote:
>> >>>
>> >>> Hi,
>> >>> I currently run a MapReduce job to rewrite a tab delimited file, and
>> then
>> >>> I use Hive for everything after that stage.
>> >>> Am I correct in thinking that I can create a Jar with my own method
>> which
>> >>> can then be called in SQL?
>> >>> Would the syntax be:
>> >>> hive> ADD JAR /tmp/parse.jar;
>> >>> hive> INSERT OVERWRITE TABLE target SELECT s.id,
>> >>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as
>> >>> parsedName;
>> >>> and parse be a MR job? If so what are the input and output formats
>> >>> please for the parse? Or is it a class implementing an interface
>> perhaps
>> >>> and Hive take care of the rest?
>> >>> Thanks for any pointers,
>> >>> Tim
>> >>>
>> >>
>> >> Tim,
>> >>
>> >> A UDF is an sql function like toString() max()
>> >> An InputFormat teachers hive to read data from Key Value files
>> >> A serde tells Hive how to parse input data into columns.
>> >> Finally, the map()reduce(), transform() keywords you described is a way
>> to
>> >> pipe data to external process and read the results back in. Almost like
>> a
>> >> non-native to hive UDF.
>> >>
>> >> So you have munged up 4 concepts together :) Do not feel bad however, I
>> >> struggled though an input format for the last month.
>> >>
>> >> It sounds most like you want a udf that takes a string and returns a
>> >> canonical representation.
>> >>
>> >>
>> >> hive> ADD JAR /tmp/parse.jar;
>> >> create temporary function canonical as 'my.package.canonical';
>> >> select canonical(my colum) from source;
>> >>
>> >> Regards,
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Adam J. O'Donnell, Ph.D.
>> Immunet Corporation
>> Cell: +1 (267) 251-0070
>>
>
>
Tim,
I think you are on the right track with the UDF approach.
You could accomplish something similiar with a serdy accept from the client
prospecting it would be more "transparent".
A UDF is a bit more reusable then a serde. You can only chose a serde once
when the table is created, but you UDF is applied on the resultset.
Edward