Hello,
I was wondering if anyone can help me out with Hive InputFormat / Deserializer.
I am trying to implement a custom file format which is similar to Avro: Each
file will have the "schema" in the header.
The issue I am having is that Hive's Deserializer interface doesn't have a way
to read this "schema" because it doesn't have access to the input file.
Some approaches that I have seen used by others but which do not work for me:
1. Set SerDe properties on partition (This doesn't work as there is more then
one file in each partition and they will have different schemas)
2. Use config.get("map.input.file") in initialize method to read the schema
(This will only work for mapreduce jobs. Simple queries in CLI will fail as
this property will not be set)
Does anyone have an idea on how this should be done?
Thank You
Alex Rovner