It sounds like what you want is a custom SerDe. I have tried to write one but ran into some difficulty.
On Tue, Apr 27, 2010 at 10:13 AM, Tim Robertson <[email protected]> wrote: > Thanks Edward, > You are indeed correct - I am confused! > So I checked out the source, and poked around. If I were to extend UDF and > implement public Text evaluate(Text source) { > would I be heading along the correct lines to use what you say above? > Thanks, > Tim > > > On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo <[email protected]> > wrote: >> >> >> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson >> <[email protected]> wrote: >>> >>> Hi, >>> I currently run a MapReduce job to rewrite a tab delimited file, and then >>> I use Hive for everything after that stage. >>> Am I correct in thinking that I can create a Jar with my own method which >>> can then be called in SQL? >>> Would the syntax be: >>> hive> ADD JAR /tmp/parse.jar; >>> hive> INSERT OVERWRITE TABLE target SELECT s.id, >>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' as >>> parsedName; >>> and parse be a MR job? If so what are the input and output formats >>> please for the parse? Or is it a class implementing an interface perhaps >>> and Hive take care of the rest? >>> Thanks for any pointers, >>> Tim >>> >> >> Tim, >> >> A UDF is an sql function like toString() max() >> An InputFormat teachers hive to read data from Key Value files >> A serde tells Hive how to parse input data into columns. >> Finally, the map()reduce(), transform() keywords you described is a way to >> pipe data to external process and read the results back in. Almost like a >> non-native to hive UDF. >> >> So you have munged up 4 concepts together :) Do not feel bad however, I >> struggled though an input format for the last month. >> >> It sounds most like you want a udf that takes a string and returns a >> canonical representation. >> >> >> hive> ADD JAR /tmp/parse.jar; >> create temporary function canonical as 'my.package.canonical'; >> select canonical(my colum) from source; >> >> Regards, >> >> >> > > -- Adam J. O'Donnell, Ph.D. Immunet Corporation Cell: +1 (267) 251-0070
