Hi Robert, Hive does support customized file input/output format other than SequenceFile/TextFile. Please take a look at Hive.g which contains the grammar for specifying InputFileFormat and OutputFileFormat. You probably only need a InputFileFormat if you only plan to read from this kind of file.
The InputFileFormat can return a BytesWritable which contains the data in Binary format. Then the SerDe.deserialize function should take that BytesWritable and convert it into some hierarchical objects. For an example of SerDe, please take a look at https://issues.apache.org/jira/browse/HIVE-553 That issue contains a fully-fledged SerDe by itself. I also plan to write a how-to for writing a SerDe, but it won't be ready in one or two weeks. Zheng On Wed, Jul 8, 2009 at 6:46 PM, Roberto Congiu<[email protected]> wrote: > Hi,I am writing a SerDe class to be able to query some proprietary format we > have from hive. > The format is basically a sequence of records that are maps coded in binary > for which we have access libraries. > The file is also gzipped. > > For what I understand, I need to > 1 - write a FileInputFormat class to read the file and extract the single > records as Writables (but I am not clear how I tell hive to use this > fileformat since all I can use is STORED AS SEQUENCEFILE/TEXTFILE. How do I > plug my format in there? ) > 2 - Write a SerDe (Since I just need to read it I need just the deserializer > part) and an ObjectInspector to let hive understand how to find a column > > is there any info around for these or somebody who's done something similar > ? > Thanks in advance, > Roberto > -- Yours, Zheng
