yyanyy commented on a change in pull request #1872:
URL: https://github.com/apache/iceberg/pull/1872#discussion_r570609722
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/ManifestEvaluator.java
##########
@@ -144,18 +143,37 @@ public Boolean or(Boolean leftResult, Boolean
rightResult) {
@Override
public <T> Boolean isNaN(BoundReference<T> ref) {
int pos = Accessors.toPosition(ref.accessor());
- // containsNull encodes whether at least one partition value is null,
lowerBound is null if
- // all partition values are null.
- if (stats.get(pos).containsNull() && stats.get(pos).lowerBound() ==
null) {
- return ROWS_CANNOT_MATCH; // all values are null
+
+ if (stats.get(pos).containsNaN() != null &&
!stats.get(pos).containsNaN()) {
+ return ROWS_CANNOT_MATCH;
+ }
+
+ if (allValuesAreNull(stats.get(pos))) {
+ return ROWS_CANNOT_MATCH;
}
return ROWS_MIGHT_MATCH;
}
+ private boolean allValuesAreNull(PartitionFieldSummary summary) {
+ // Before introducing containsNaN field, containsNull encodes whether at
least one partition value is null,
+ // lowerBound is null if all partition values are null.
+ // After introducing containsNaN field, containsNaN must be false to
ensure all values are null since bounds
+ // don't include NaN anymore.
+ return summary.containsNull() && summary.lowerBound() == null &&
+ (summary.containsNaN() == null || !summary.containsNaN());
Review comment:
I think the change for excluding NaN in `lower`/`upper` and adding
`containsNaN` both belong to this PR, so if a release contains this change,
then it would either be (1) `NaN` is part of `lower`/`upper` and `containsNaN`
is missing, or (2) `containsNaN` exists and `lower`/`upper` doesn't store
`NaN`. But I guess people may implement their own manifest summary that already
exclude `NaN` from bounds but no `containsNaN`, so we still want to handle
this, and file level metrics could give more granular information so there
isn't necessarily any performance penalty. I have updated this PR to check for
existence of `containsNaN`, but please let me know if my understanding isn't
correct!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]