Hello,

I was wondering if anyone can help me out with Hive InputFormat / Deserializer.

I am trying to implement a custom file format which is similar to Avro: Each 
file will have the "schema" in the header.

The issue I am having is that Hive's Deserializer interface doesn't have a way 
to read this "schema" because it doesn't have access to the input file.

Some approaches that I have seen used by others but which do not work for me:

1. Set SerDe properties on partition (This doesn't work as there is more then 
one file in each partition and they will have different schemas)
2. Use config.get("map.input.file") in initialize method to read the schema 
(This will only work for mapreduce jobs. Simple queries in CLI will fail as 
this property will not be set)


Does anyone have an idea on how this should be done?

Thank You
Alex Rovner

Reply via email to