Hi Kevin,

Yes I will work on a how-to tutorial on SerDe this week.

One important performance benefit of Hive SerDe is that it can reuse
the same object to deserialize different rows - which means there can
be no object creation needed for each of the rows.

Zheng

On Sun, Jul 12, 2009 at 10:15 PM, Kevin Weil<[email protected]> wrote:
> +1 to Roberto's question... I'd love some more examples here too.  I looked
> into writing a protocol buffer Serde a little while ago (the company I was
> working for had data coming in as protobufs, and it seemed silly to convert
> every piece to thrift first) and was underwhelmed by the
> documentation/explanations.  FWIW, and maybe to generate a little friendly
> competition, I was able to write a pig LoadFunc to load arbitrary protocol
> buffers to pig tuples without much trouble...
> Kevin
>
> On Wed, Jul 8, 2009 at 4:26 PM, Roberto Congiu <[email protected]>
> wrote:
>>
>> Hi,
>> I am writing a SerDe class to be able to query some proprietary format we
>> have from hive.
>> The format is basically a sequence of records that are maps coded in
>> binary for which we have access libraries.
>> The file is also gzipped.
>> For what I understand, I need to
>> 1 - write a FileInputFormat class to read the file and extract the single
>> records as Writables (but I am not clear how I tell hive to use this
>> fileformat since all I can use is STORED AS SEQUENCEFILE/TEXTFILE. How do I
>> plug my format in there? )
>> 2 - Write a SerDe (Since I just need to read it I need just the
>> deserializer part) and an ObjectInspector to let hive understand how to find
>> a column
>> is there any info around for these or somebody who's done something
>> similar ?
>> Thanks in advance,
>> Roberto
>



-- 
Yours,
Zheng

Reply via email to