On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson
<[email protected]>wrote:

> Hi,
>
> I currently run a MapReduce job to rewrite a tab delimited file, and then I
> use Hive for everything after that stage.
>
> Am I correct in thinking that I can create a Jar with my own method which
> can then be called in SQL?
>
> Would the syntax be:
>
>   hive> ADD JAR /tmp/parse.jar;
>   hive> INSERT OVERWRITE TABLE target SELECT s.id,
> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as
> parsedName;
>
> and parse be a MR job?  If so what are the input and output formats please
> for the parse?  Or is it a class implementing an interface perhaps and Hive
> take care of the rest?
>
> Thanks for any pointers,
> Tim
>
>
>
Tim,

A UDF is an sql function like toString() max()
An InputFormat teachers hive to read data from Key Value files
A serde tells Hive how to parse input data into columns.
Finally, the map()reduce(), transform() keywords you described is a way to
pipe data to external process and read the results back in. Almost like a
non-native to hive UDF.

So you have munged up 4 concepts together :) Do not feel bad however, I
struggled though an input format for the last month.

It sounds most like you want a udf that takes a string and returns a
canonical representation.


  hive> ADD JAR /tmp/parse.jar;
create temporary function canonical as 'my.package.canonical';
select canonical(my colum) from source;

Regards,

Reply via email to