It sounds like what you want is a custom SerDe.  I have tried to write
one but ran into some difficulty.

On Tue, Apr 27, 2010 at 10:13 AM, Tim Robertson
<[email protected]> wrote:
> Thanks Edward,
> You are indeed correct - I am confused!
> So I checked out the source, and poked around.  If I were to extend UDF and
> implement  public Text evaluate(Text source) {
> would I be heading along the correct lines to use what you say above?
> Thanks,
> Tim
>
>
> On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo <[email protected]>
> wrote:
>>
>>
>> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson
>> <[email protected]> wrote:
>>>
>>> Hi,
>>> I currently run a MapReduce job to rewrite a tab delimited file, and then
>>> I use Hive for everything after that stage.
>>> Am I correct in thinking that I can create a Jar with my own method which
>>> can then be called in SQL?
>>> Would the syntax be:
>>>   hive> ADD JAR /tmp/parse.jar;
>>>   hive> INSERT OVERWRITE TABLE target SELECT s.id,
>>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as
>>> parsedName;
>>> and parse be a MR job?  If so what are the input and output formats
>>> please for the parse?  Or is it a class implementing an interface perhaps
>>> and Hive take care of the rest?
>>> Thanks for any pointers,
>>> Tim
>>>
>>
>> Tim,
>>
>> A UDF is an sql function like toString() max()
>> An InputFormat teachers hive to read data from Key Value files
>> A serde tells Hive how to parse input data into columns.
>> Finally, the map()reduce(), transform() keywords you described is a way to
>> pipe data to external process and read the results back in. Almost like a
>> non-native to hive UDF.
>>
>> So you have munged up 4 concepts together :) Do not feel bad however, I
>> struggled though an input format for the last month.
>>
>> It sounds most like you want a udf that takes a string and returns a
>> canonical representation.
>>
>>
>>   hive> ADD JAR /tmp/parse.jar;
>> create temporary function canonical as 'my.package.canonical';
>> select canonical(my colum) from source;
>>
>> Regards,
>>
>>
>>
>
>



-- 
Adam J. O'Donnell, Ph.D.
Immunet Corporation
Cell: +1 (267) 251-0070

Reply via email to