Re: Review Request 32499: HIVE-10086: Hive throws error when accessing Parquet file schema using field name match

Sergio Pena Thu, 26 Mar 2015 13:32:14 -0700


> On March 26, 2015, 7:11 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java,
> >  line 65
> > <https://reviews.apache.org/r/32499/diff/1/?file=906071#file906071line65>
> >
> >     why remove static ?


Thanks Mohit.
I did not know what's the benefit of 'private static' at the beginning, so I 
thought this was just extra code.

But I know now that it has some benefits like guaranteeing that it does not 
touch instance fields, and when functions are statically linked, then executing 
may be a litte faster.


> On March 26, 2015, 7:11 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java,
> >  line 90
> > <https://reviews.apache.org/r/32499/diff/1/?file=906071#file906071line90>
> >
> >     Looks like this method is called recursively (to deal with nested 
> > fields). Can we have duplicate column names across nesting levels ?

Yes, parquet supports duplicate columns across nested levels.
So, this is an example:

optional group a {
  required binary name;
  optional group addr {
    optional binary a;
  }
}

optional group b {
  required binary name;
  optional group addr {
    optional binary b;
  }
}


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32499/#review77924
-----------------------------------------------------------


On March 25, 2015, 10:42 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32499/
> -----------------------------------------------------------
> 
> (Updated March 25, 2015, 10:42 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-10086
>     https://issues.apache.org/jira/browse/HIVE-10086
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Attached is the patch that handles schema that do not match between Parquet 
> and Hive.
> 
> The access to Parquet data is with name matching in this case. The table 
> column may have different schema order, but if the name matches the parquet 
> column name, then the value is retrieved.
> 
> Also, if the Hive schema has columns and struct elements that do not match 
> with the Parquet schema, then it will return NULL values instead.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java
>  57ae7a9740d55b407cadfc8bc030593b29f90700 
>   ql/src/test/queries/clientpositive/parquet_schema_evolution.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/parquet_table_with_subschema.q 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_schema_evolution.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/parquet_table_with_subschema.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/32499/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>

Re: Review Request 32499: HIVE-10086: Hive throws error when accessing Parquet file schema using field name match

Reply via email to