gszadovszky commented on a change in pull request #2551:
URL: https://github.com/apache/iceberg/pull/2551#discussion_r633449947
##########
File path:
parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java
##########
@@ -140,6 +152,7 @@ public void createInputFile() throws IOException {
OutputFile outFile = Files.localOutput(parquetFile);
try (FileAppender<Record> appender = Parquet.write(outFile)
.schema(FILE_SCHEMA)
+ .withWriterVersion(WriterVersion.PARQUET_2_0) // V2 is needed to force
dictionary encoding for FIXED type
Review comment:
@pvary: V1 and V2 are quite tricky references because it is not always
clear what it covers. From parquet-mr point of view the most basic differences
are the page headers (v1 or v2) and the used encodings. Since parquet-mr has
support for v2 since a while it is mostly not about the old parquet readers but
the other implementations of parquet readers. For example Impala does not
support v2 page headers nor the encodings used by parquet-mr in case of v2 is
set.
BUT, using dictionary encoding for fixed data types are not against parquet
specification for v1. And Impala does it. (It is only parquet-mr that does not
do dictionary encoding for fixed types in case of v1 but it can read it and the
filtering also works just fine.)
I've been hesitating to switch the whole unit test to V2 (or implement a
separate test for fixed+dictionary) but this seemed easier and simpler. Also,
V1 or V2 seems irrelevant to this test (excluding the fact that we cannot test
fixed+dictionary with V1).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]