Most of my Hadoop data is produced by Java MR jobs that store data as custom Writable pairs in SequenceFiles. I'm excited to bring that data into a Hive table so that I can start building out and prototyping more derived analytics. Can anyone point me towards a relevant example? Since I'm just getting started I've begun with hive-0.5.0. Thus far I've started with the RegexSerDe example and tried to whittle it down a bit to make it into what I want but I'm lacking context.
Since I'm not trying to take data and write it it back into these SequenceFiles, I only need to implement the Deserializer interface, right? How do I tell Hive that the underlying data InputFormat is a SequenceFile? What's the relationship between the Writable that arrives as the parameter to the deserialize function and the contents of the underlying SequenceFile? regards, Andrew
