zhongyujiang commented on code in PR #6836:
URL: https://github.com/apache/iceberg/pull/6836#discussion_r1107953666


##########
parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java:
##########
@@ -1030,7 +1030,7 @@ public void testMissingDictionaryPageForColumn() {
         IllegalStateException.class,
         "Failed to read required dictionary page for id: 5",
         () ->
-            new ParquetDictionaryRowGroupFilter(SCHEMA, notEqual("some_nulls", 
"some"))
+            new ParquetDictionaryRowGroupFilter(SCHEMA, equal("some_nulls", 
"some"))

Review Comment:
   Thanks for reviewing!
   
   I checked the PR #93 that introduced this UT and the relevant 
[discussion](https://github.com/apache/iceberg/pull/86#discussion_r253802993). 
Basically it wants to throw an exception when no dict is available to avoid NPE 
after calling `dict`. But as @rdblue mentioned 
[here](https://github.com/apache/iceberg/pull/86#discussion_r253943635),  all 
runs of `dict` in `ParquetDictionaryRowGroupFilter` is guaranteed that there 
must be a dictionary. So under normal circumstances, this exception will not be 
thrown out I think. The reason why this UT can catch the exception is because 
it mocked an abnormal 
[DictionaryPageReadStore](https://github.com/apache/iceberg/blob/49d833aa83f6d4681c6c9f8473080f6e78124a0e/parquet/src/test/java/org/apache/iceberg/parquet/TestDictionaryRowGroupFilter.java#L1034).
   Above is the context about this UT. So I think replacing it with `eq` can 
still serve the original purpose of verification, right? So I think moving 
null-check is reasonable because the run of `notEq` or `notIn` is guaranteed 
there will be a dictionary there. Moving it forward can help avoiding the 
dictionary decoding overhead when the column has nulls.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to