Thanks Edward,

I get where you are coming from now with that explanation.

Cheers,
Tim


On Tue, Apr 27, 2010 at 7:53 PM, Edward Capriolo <[email protected]>wrote:

>
>
> On Tue, Apr 27, 2010 at 1:48 PM, Tim Robertson 
> <[email protected]>wrote:
>
>> Hmmm... I am not trying to serialize or deserialize custom content, but
>> simply take an input String (Text) run some Java  and return a new (Text) by
>> calling a function
>>
>> Looking at public class UDFYear extends UDF { the annotation at the top
>> suggests extending UDF and adding the annotation, might be enough.
>>
>> I'll try it anyways...
>> Tim
>>
>> On Tue, Apr 27, 2010 at 7:37 PM, Adam O'Donnell <[email protected]> wrote:
>>
>>> It sounds like what you want is a custom SerDe.  I have tried to write
>>> one but ran into some difficulty.
>>>
>>> On Tue, Apr 27, 2010 at 10:13 AM, Tim Robertson
>>> <[email protected]> wrote:
>>> > Thanks Edward,
>>> > You are indeed correct - I am confused!
>>> > So I checked out the source, and poked around.  If I were to extend UDF
>>> and
>>> > implement  public Text evaluate(Text source) {
>>> > would I be heading along the correct lines to use what you say above?
>>> > Thanks,
>>> > Tim
>>> >
>>> >
>>> > On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo <
>>> [email protected]>
>>> > wrote:
>>> >>
>>> >>
>>> >> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson
>>> >> <[email protected]> wrote:
>>> >>>
>>> >>> Hi,
>>> >>> I currently run a MapReduce job to rewrite a tab delimited file, and
>>> then
>>> >>> I use Hive for everything after that stage.
>>> >>> Am I correct in thinking that I can create a Jar with my own method
>>> which
>>> >>> can then be called in SQL?
>>> >>> Would the syntax be:
>>> >>>   hive> ADD JAR /tmp/parse.jar;
>>> >>>   hive> INSERT OVERWRITE TABLE target SELECT s.id,
>>> >>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse'
>>> as
>>> >>> parsedName;
>>> >>> and parse be a MR job?  If so what are the input and output formats
>>> >>> please for the parse?  Or is it a class implementing an interface
>>> perhaps
>>> >>> and Hive take care of the rest?
>>> >>> Thanks for any pointers,
>>> >>> Tim
>>> >>>
>>> >>
>>> >> Tim,
>>> >>
>>> >> A UDF is an sql function like toString() max()
>>> >> An InputFormat teachers hive to read data from Key Value files
>>> >> A serde tells Hive how to parse input data into columns.
>>> >> Finally, the map()reduce(), transform() keywords you described is a
>>> way to
>>> >> pipe data to external process and read the results back in. Almost
>>> like a
>>> >> non-native to hive UDF.
>>> >>
>>> >> So you have munged up 4 concepts together :) Do not feel bad however,
>>> I
>>> >> struggled though an input format for the last month.
>>> >>
>>> >> It sounds most like you want a udf that takes a string and returns a
>>> >> canonical representation.
>>> >>
>>> >>
>>> >>   hive> ADD JAR /tmp/parse.jar;
>>> >> create temporary function canonical as 'my.package.canonical';
>>> >> select canonical(my colum) from source;
>>> >>
>>> >> Regards,
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Adam J. O'Donnell, Ph.D.
>>> Immunet Corporation
>>> Cell: +1 (267) 251-0070
>>>
>>
>>
> Tim,
>
> I think you are on the right track with the UDF approach.
>
> You could accomplish something similiar with a serdy accept from the client
> prospecting it would be more "transparent".
>
> A UDF is a bit more reusable then a serde. You can only chose a serde once
> when the table is created, but you UDF is applied on the resultset.
>
> Edward
>

Reply via email to