----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/70474/#review215157 -----------------------------------------------------------
Ship it! Ship It! - Peter Vary On máj. 9, 2019, 7:51 de, Marta Kuczora wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/70474/ > ----------------------------------------------------------- > > (Updated máj. 9, 2019, 7:51 de) > > > Review request for hive and Peter Vary. > > > Bugs: HIVE-21407 > https://issues.apache.org/jira/browse/HIVE-21407 > > > Repository: hive-git > > > Description > ------- > > The idea behind the patch is that for CHAR columns extend the predicate which > is pushed to Parquet with an “or” clause which contains the same expression > with a padded and a stripped value. > Example: > column c is a CHAR(10) type and the search expression is c='apple' > The predicate which is pushed to Parquet looked like c='apple ' before the > patch and it would look like (c='apple ' or c='apple') after the patch. > Since the value 'apple' is stored in Parquet without padding, the predicate > before the patch didn’t return any rows. With the patch it will return the > correct row. > Since on predicate level, there is no distinction between CHAR or VARCHAR, > the predicates for VARCHARs will be changed as well, so the result set > returned from Parquet will be wider than before. > Example: > A table contains a c VARCHAR(10) column and there is a row where c='apple' > and there is an other row where c='apple '. If the search expression is > c='apple ', both rows will be returned from Parquet after the patch. But > since Hive is doing an additional filtering on the rows returned from > Parquet, it won’t be a problem, the result set returned by Hive will contain > only the row with the value 'apple '. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/LeafFilterFactory.java > be4c0d5 > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetRecordReaderWrapper.java > 0210a0a > > ql/src/test/org/apache/hadoop/hive/ql/io/parquet/read/TestParquetFilterPredicate.java > d464046 > ql/src/test/queries/clientpositive/parquet_ppd_char.q 4230d8c > ql/src/test/queries/clientpositive/parquet_ppd_char2.q PRE-CREATION > ql/src/test/results/clientpositive/parquet_ppd_char2.q.out PRE-CREATION > > > Diff: https://reviews.apache.org/r/70474/diff/2/ > > > Testing > ------- > > Added new q test for testing the PPD for char and varchar types. Also > extended the unit tests for the > ParquetFilterPredicateConverter.toFilterPredicate method. > > The TestParquetRecordReaderWrapper and the TestParquetFilterPredicate are > both testing the same thing, the behavior of the > ParquetFilterPredicateConverter.toFilterPredicate method. It doesn't make > sense to have tests for the same use case in different test classes, so moved > the test cases from the TestParquetRecordReaderWrapper to > TestParquetFilterPredicate. > > > Thanks, > > Marta Kuczora > >