-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70474/
-----------------------------------------------------------

Review request for hive and Peter Vary.


Bugs: HIVE-21407
    https://issues.apache.org/jira/browse/HIVE-21407


Repository: hive-git


Description
-------

The idea behind the patch is that for CHAR columns extend the predicate which 
is pushed to Parquet with an “or” clause which contains the same expression 
with a padded and a stripped value.
Example:
column c is a CHAR(10) type and the search expression is c='apple'
The predicate which is pushed to Parquet looked like c='apple ' before the 
patch and it would look like (c='apple ' or c='apple') after the patch.
Since the value 'apple' is stored in Parquet without padding, the predicate 
before the patch didn’t return any rows. With the patch it will return the 
correct row. 
Since on predicate level, there is no distinction between CHAR or VARCHAR, the 
predicates for VARCHARs will be changed as well, so the result set returned 
from Parquet will be wider than before.
Example:
A table contains a c VARCHAR(10) column and there is a row where c='apple' and 
there is an other row where c='apple '. If the search expression is c='apple ', 
both rows will be returned from Parquet after the patch. But since Hive is 
doing an additional filtering on the rows returned from Parquet, it won’t be a 
problem, the result set returned by Hive will contain only the row with the 
value 'apple '.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java 
be4c0d5 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java
 0210a0a 
  
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java
 d464046 
  ql/src/test/queries/clientpositive/parquet_ppd_char.q 4230d8c 
  ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION 
  ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/70474/diff/1/


Testing
-------

Added new q test for testing the PPD for char and varchar types. Also extended 
the unit tests for the ParquetFilterPredicateConverter.toFilterPredicate method.

The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are both 
testing the same thing, the behavior of the 
ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make sense 
to have tests for the same use case in different test classes, so moved the 
test cases from the TestParquetRecordReaderWrapper to 
TestParquetFilterPredicate.


Thanks,

Marta Kuczora

Reply via email to