Hi Kevin, Yes I will work on a how-to tutorial on SerDe this week.
One important performance benefit of Hive SerDe is that it can reuse the same object to deserialize different rows - which means there can be no object creation needed for each of the rows. Zheng On Sun, Jul 12, 2009 at 10:15 PM, Kevin Weil<[email protected]> wrote: > +1 to Roberto's question... I'd love some more examples here too. I looked > into writing a protocol buffer Serde a little while ago (the company I was > working for had data coming in as protobufs, and it seemed silly to convert > every piece to thrift first) and was underwhelmed by the > documentation/explanations. FWIW, and maybe to generate a little friendly > competition, I was able to write a pig LoadFunc to load arbitrary protocol > buffers to pig tuples without much trouble... > Kevin > > On Wed, Jul 8, 2009 at 4:26 PM, Roberto Congiu <[email protected]> > wrote: >> >> Hi, >> I am writing a SerDe class to be able to query some proprietary format we >> have from hive. >> The format is basically a sequence of records that are maps coded in >> binary for which we have access libraries. >> The file is also gzipped. >> For what I understand, I need to >> 1 - write a FileInputFormat class to read the file and extract the single >> records as Writables (but I am not clear how I tell hive to use this >> fileformat since all I can use is STORED AS SEQUENCEFILE/TEXTFILE. How do I >> plug my format in there? ) >> 2 - Write a SerDe (Since I just need to read it I need just the >> deserializer part) and an ObjectInspector to let hive understand how to find >> a column >> is there any info around for these or somebody who's done something >> similar ? >> Thanks in advance, >> Roberto > -- Yours, Zheng
