[GitHub] [iceberg] gszadovszky commented on a change in pull request #2551: Filtering dictionary encoded FIX_LEN_BYTE_ARRAY

GitBox Mon, 17 May 2021 11:29:51 -0700


gszadovszky commented on a change in pull request #2551:
URL: https://github.com/apache/iceberg/pull/2551#discussion_r633449947




##########
File path: 
parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java
##########
@@ -140,6 +152,7 @@ public void createInputFile() throws IOException {
     OutputFile outFile = Files.localOutput(parquetFile);
     try (FileAppender<Record> appender = Parquet.write(outFile)
         .schema(FILE_SCHEMA)
+        .withWriterVersion(WriterVersion.PARQUET_2_0) // V2 is needed to force 
dictionary encoding for FIXED type

Review comment:
       @pvary: V1 and V2 are quite tricky references because it is not always 
clear what it covers. From parquet-mr point of view the most basic differences 
are the page headers (v1 or v2) and the used encodings. Since parquet-mr has 
support for v2 since a while it is mostly not about the old parquet readers but 
the other implementations of parquet readers. For example Impala does not 
support v2 page headers nor the encodings used by parquet-mr in case of v2 is 
set.
   BUT, using dictionary encoding for fixed data types are not against parquet 
specification for v1. And Impala does it. (It is only parquet-mr that does not 
do dictionary encoding for fixed types in case of v1 but it can read it and the 
filtering also works just fine.)
   
   I've been hesitating to switch the whole unit test to V2 (or implement a 
separate test for fixed+dictionary) but this seemed easier and simpler. Also, 
V1 or V2 seems irrelevant to this test (excluding the fact that we cannot test 
fixed+dictionary with V1).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] gszadovszky commented on a change in pull request #2551: Filtering dictionary encoded FIX_LEN_BYTE_ARRAY

Reply via email to