Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

cheng xu Mon, 05 Feb 2018 18:19:56 -0800


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> >


Thanks Vihang for your review. Comments left below.


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
> > Lines 24 (patched)
> > <https://reviews.apache.org/r/65478/diff/1/?file=1951881#file1951881line24>
> >
> >     Do we need to override the methods for other reads as well? What is the 
> > criteria to identify the methods which need to be overriden for this and 
> > TypesFromInt64PageReader?
> 
> Jerry Chen wrote:
>     Current Ferdinand follows the same conversion principles of 
> ETypeConverter in Hive. The basic conversion implemented in ETypeConverter 
> is: low precision data type can convert to high precison data type. for 
> int32: it can be converted to int64, float, double. for int64: it can be 
> converted to float, double. For float: it can be converted to double.
>     
>     Although other conversions can logically supported such as int64 to int32 
> or double to float. But that is just not always safe.

Yes, as Jerry mentioned, our current implementation stays the same rules as non 
vectorization path. To support other conversion, we may need to ensure both 
vectorization and non vectorization have the same behavior since vectorization 
is just a performance feature.


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> > ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q
> > Lines 9 (patched)
> > <https://reviews.apache.org/r/65478/diff/1/?file=1951887#file1951887line9>
> >
> >     can you please confirm that the result of this q file matches with 
> > non-vectorized execution?

Yes, I checked the output as well by disabling the vectorization. And they have 
the same output as vectorization enabled case.


> On Feb. 6, 2018, 1:46 a.m., Vihang Karajgaonkar wrote:
> > ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out
> > Lines 140 (patched)
> > <https://reviews.apache.org/r/65478/diff/1/?file=1951888#file1951888line140>
> >
> >     this is interesting. Do you know why this row is returned first?

The order is the same as non vectorization read path. It may result from the 
order of generated files.


- cheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65478/#review196718
-----------------------------------------------------------


On Feb. 5, 2018, 4:46 p.m., cheng xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65478/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2018, 4:46 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> VectorizedParquetReader throws an exception when trying to reading from a 
> parquet table on which new columns are added.
> 
> 
> Diffs
> -----
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/BaseVectorizedColumnReader.java
>  907a9b8 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/ParquetDataColumnReaderFactory.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedDummyColumnReader.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedParquetRecordReader.java
>  08ac57b 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java
>  9e414dc 
>   
> ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java
>  5d3ebd6 
>   ql/src/test/queries/clientpositive/schema_evol_par_vec_table.q PRE-CREATION 
>   ql/src/test/results/clientpositive/schema_evol_par_vec_table.q.out 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/65478/diff/2/
> 
> 
> Testing
> -------
> 
> Newly added UT passed and qtest passed locally.
> 
> 
> Thanks,
> 
> cheng xu
> 
>

Re: Review Request 65478: HIVE-18553 VectorizedParquetReader fails after adding a new column to table

Reply via email to