Re: Hive Deserializer Interface

Zheng Shao Sun, 18 Jul 2010 11:19:45 -0700

In hive (and all relational databases), schema of different rows in the same 
table is the same.


As a result, we should not put files with different schemas into the same table 
(or partition)

Sent from my iPhone

On Jul 17, 2010, at 9:33 PM, "Alex Rovner" <[email protected]> wrote:

> Hello,
> 
> I was wondering if anyone can help me out with Hive InputFormat / 
> Deserializer.
> 
> I am trying to implement a custom file format which is similar to Avro: Each 
> file will have the "schema" in the header.
> 
> The issue I am having is that Hive's Deserializer interface doesn't have a 
> way to read this "schema" because it doesn't have access to the input file.
> 
> Some approaches that I have seen used by others but which do not work for me:
> 
> 1. Set SerDe properties on partition (This doesn't work as there is more then 
> one file in each partition and they will have different schemas)
> 2. Use config.get("map.input.file") in initialize method to read the schema 
> (This will only work for mapreduce jobs. Simple queries in CLI will fail as 
> this property will not be set)
> 
> 
> Does anyone have an idea on how this should be done?
> 
> Thank You
> Alex Rovner

Re: Hive Deserializer Interface

Reply via email to