In hive (and all relational databases), schema of different rows in the same table is the same.
As a result, we should not put files with different schemas into the same table (or partition) Sent from my iPhone On Jul 17, 2010, at 9:33 PM, "Alex Rovner" <[email protected]> wrote: > Hello, > > I was wondering if anyone can help me out with Hive InputFormat / > Deserializer. > > I am trying to implement a custom file format which is similar to Avro: Each > file will have the "schema" in the header. > > The issue I am having is that Hive's Deserializer interface doesn't have a > way to read this "schema" because it doesn't have access to the input file. > > Some approaches that I have seen used by others but which do not work for me: > > 1. Set SerDe properties on partition (This doesn't work as there is more then > one file in each partition and they will have different schemas) > 2. Use config.get("map.input.file") in initialize method to read the schema > (This will only work for mapreduce jobs. Simple queries in CLI will fail as > this property will not be set) > > > Does anyone have an idea on how this should be done? > > Thank You > Alex Rovner
