+1 to Roberto's question... I'd love some more examples here too. I looked into writing a protocol buffer Serde a little while ago (the company I was working for had data coming in as protobufs, and it seemed silly to convert every piece to thrift first) and was underwhelmed by the documentation/explanations. FWIW, and maybe to generate a little friendly competition, I was able to write a pig LoadFunc to load arbitrary protocol buffers to pig tuples without much trouble... Kevin
On Wed, Jul 8, 2009 at 4:26 PM, Roberto Congiu <[email protected]>wrote: > Hi,I am writing a SerDe class to be able to query some proprietary format > we have from hive. > The format is basically a sequence of records that are maps coded in binary > for which we have access libraries. > The file is also gzipped. > > For what I understand, I need to > 1 - write a FileInputFormat class to read the file and extract the single > records as Writables (but I am not clear how I tell hive to use this > fileformat since all I can use is STORED AS SEQUENCEFILE/TEXTFILE. How do > I plug my format in there? ) > 2 - Write a SerDe (Since I just need to read it I need just the > deserializer part) and an ObjectInspector to let hive understand how to find > a column > > is there any info around for these or somebody who's done something similar > ? > Thanks in advance, > Roberto >
