Hi Andrew, You can specify that your input data is stored in SequenceFile by defining an external table stored as SequenceFile.
create external table T (a int, b double) stored as SequenceFile; assuming that the custom Writable pair are IntWritable and DoubleWritable respectively. Hive also support array, struct and map types for fields. So you may not need to write your own Deserializer. Ning On Jun 6, 2010, at 11:44 AM, Andrew Rothstein wrote: > Most of my Hadoop data is produced by Java MR jobs that store data as > custom Writable pairs in SequenceFiles. I'm excited to bring that > data into a Hive table so that I can start building out and > prototyping more derived analytics. Can anyone point me towards a > relevant example? Since I'm just getting started I've begun with > hive-0.5.0. Thus far I've started with the RegexSerDe example and > tried to whittle it down a bit to make it into what I want but I'm > lacking context. > > Since I'm not trying to take data and write it it back into these > SequenceFiles, I only need to implement the Deserializer interface, > right? > > How do I tell Hive that the underlying data InputFormat is a > SequenceFile? What's the relationship between the Writable that > arrives as the parameter to the deserialize function and the contents > of the underlying SequenceFile? > > regards, Andrew
