Thanks Edward,

You are indeed correct - I am confused!

So I checked out the source, and poked around.  If I were to extend UDF and
implement  public Text evaluate(Text source) {
would I be heading along the correct lines to use what you say above?

Thanks,
Tim



On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo <[email protected]>wrote:

>
>
> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson <[email protected]
> > wrote:
>
>> Hi,
>>
>> I currently run a MapReduce job to rewrite a tab delimited file, and then
>> I use Hive for everything after that stage.
>>
>> Am I correct in thinking that I can create a Jar with my own method which
>> can then be called in SQL?
>>
>> Would the syntax be:
>>
>>   hive> ADD JAR /tmp/parse.jar;
>>   hive> INSERT OVERWRITE TABLE target SELECT s.id,
>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as
>> parsedName;
>>
>> and parse be a MR job?  If so what are the input and output formats please
>> for the parse?  Or is it a class implementing an interface perhaps and Hive
>> take care of the rest?
>>
>> Thanks for any pointers,
>> Tim
>>
>>
>>
> Tim,
>
> A UDF is an sql function like toString() max()
> An InputFormat teachers hive to read data from Key Value files
> A serde tells Hive how to parse input data into columns.
> Finally, the map()reduce(), transform() keywords you described is a way to
> pipe data to external process and read the results back in. Almost like a
> non-native to hive UDF.
>
> So you have munged up 4 concepts together :) Do not feel bad however, I
> struggled though an input format for the last month.
>
> It sounds most like you want a udf that takes a string and returns a
> canonical representation.
>
>
>
>   hive> ADD JAR /tmp/parse.jar;
> create temporary function canonical as 'my.package.canonical';
> select canonical(my colum) from source;
>
> Regards,
>
>
>
>

Reply via email to