Thanks Edward, I get where you are coming from now with that explanation.
Cheers, Tim On Tue, Apr 27, 2010 at 7:53 PM, Edward Capriolo <[email protected]>wrote: > > > On Tue, Apr 27, 2010 at 1:48 PM, Tim Robertson > <[email protected]>wrote: > >> Hmmm... I am not trying to serialize or deserialize custom content, but >> simply take an input String (Text) run some Java and return a new (Text) by >> calling a function >> >> Looking at public class UDFYear extends UDF { the annotation at the top >> suggests extending UDF and adding the annotation, might be enough. >> >> I'll try it anyways... >> Tim >> >> On Tue, Apr 27, 2010 at 7:37 PM, Adam O'Donnell <[email protected]> wrote: >> >>> It sounds like what you want is a custom SerDe. I have tried to write >>> one but ran into some difficulty. >>> >>> On Tue, Apr 27, 2010 at 10:13 AM, Tim Robertson >>> <[email protected]> wrote: >>> > Thanks Edward, >>> > You are indeed correct - I am confused! >>> > So I checked out the source, and poked around. If I were to extend UDF >>> and >>> > implement public Text evaluate(Text source) { >>> > would I be heading along the correct lines to use what you say above? >>> > Thanks, >>> > Tim >>> > >>> > >>> > On Tue, Apr 27, 2010 at 5:11 PM, Edward Capriolo < >>> [email protected]> >>> > wrote: >>> >> >>> >> >>> >> On Tue, Apr 27, 2010 at 10:22 AM, Tim Robertson >>> >> <[email protected]> wrote: >>> >>> >>> >>> Hi, >>> >>> I currently run a MapReduce job to rewrite a tab delimited file, and >>> then >>> >>> I use Hive for everything after that stage. >>> >>> Am I correct in thinking that I can create a Jar with my own method >>> which >>> >>> can then be called in SQL? >>> >>> Would the syntax be: >>> >>> hive> ADD JAR /tmp/parse.jar; >>> >>> hive> INSERT OVERWRITE TABLE target SELECT s.id, >>> >>> s.canonical, parsedName FROM source s MAP s.canonical using 'parse' >>> as >>> >>> parsedName; >>> >>> and parse be a MR job? If so what are the input and output formats >>> >>> please for the parse? Or is it a class implementing an interface >>> perhaps >>> >>> and Hive take care of the rest? >>> >>> Thanks for any pointers, >>> >>> Tim >>> >>> >>> >> >>> >> Tim, >>> >> >>> >> A UDF is an sql function like toString() max() >>> >> An InputFormat teachers hive to read data from Key Value files >>> >> A serde tells Hive how to parse input data into columns. >>> >> Finally, the map()reduce(), transform() keywords you described is a >>> way to >>> >> pipe data to external process and read the results back in. Almost >>> like a >>> >> non-native to hive UDF. >>> >> >>> >> So you have munged up 4 concepts together :) Do not feel bad however, >>> I >>> >> struggled though an input format for the last month. >>> >> >>> >> It sounds most like you want a udf that takes a string and returns a >>> >> canonical representation. >>> >> >>> >> >>> >> hive> ADD JAR /tmp/parse.jar; >>> >> create temporary function canonical as 'my.package.canonical'; >>> >> select canonical(my colum) from source; >>> >> >>> >> Regards, >>> >> >>> >> >>> >> >>> > >>> > >>> >>> >>> >>> -- >>> Adam J. O'Donnell, Ph.D. >>> Immunet Corporation >>> Cell: +1 (267) 251-0070 >>> >> >> > Tim, > > I think you are on the right track with the UDF approach. > > You could accomplish something similiar with a serdy accept from the client > prospecting it would be more "transparent". > > A UDF is a bit more reusable then a serde. You can only chose a serde once > when the table is created, but you UDF is applied on the resultset. > > Edward >
