On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson <[email protected]>wrote:
> Hi, > > I currently run a MapReduce job to rewrite a tab delimited file, and then I > use Hive for everything after that stage. > > Am I correct in thinking that I can create a Jar with my own method which > can then be called in SQL? > > Would the syntax be: > > hive> ADD JAR /tmp/parse.jar; > hive> INSERT OVERWRITE TABLE target SELECT s.id, > s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as > parsedName; > > and parse be a MR job? If so what are the input and output formats please > for the parse? Or is it a class implementing an interface perhaps and Hive > take care of the rest? > > Thanks for any pointers, > Tim > > > Tim, A UDF is an sql function like toString() max() An InputFormat teachers hive to read data from Key Value files A serde tells Hive how to parse input data into columns. Finally, the map()reduce(), transform() keywords you described is a way to pipe data to external process and read the results back in. Almost like a non-native to hive UDF. So you have munged up 4 concepts together :) Do not feel bad however, I struggled though an input format for the last month. It sounds most like you want a udf that takes a string and returns a canonical representation. hive> ADD JAR /tmp/parse.jar; create temporary function canonical as 'my.package.canonical'; select canonical(my colum) from source; Regards,
