fallintoplace opened a new issue, #1091:
URL: https://github.com/apache/iceberg-go/issues/1091
### Apache Iceberg version
main (development)
### Please describe the bug
I noticed a case where `strictMetricsEval` may be too optimistic when null
counts are missing for a nullable field.
The strict evaluator loads metrics directly from the data file:
```go
ev.valueCounts, ev.nullCounts = file.ValueCounts(), file.NullValueCounts()
ev.nanCounts = file.NaNValueCounts()
ev.lowerBounds, ev.upperBounds = file.LowerBoundValues(),
file.UpperBoundValues()
```
For range/equality predicates, it first checks whether the field may contain
nulls:
```go
if m.canContainNulls(fieldID) || m.canContainNans(fieldID) {
return rowsMightNotMatch
}
```
But `canContainNulls` returns false when the field is missing from
`NullValueCounts()`:
```go
func (m *strictMetricsEval) canContainNulls(fieldID int) bool {
cnt, exists := m.nullCounts[fieldID]
return exists && cnt > 0
}
```
For nullable fields, I think a missing null-count entry should be treated as
unknown rather than as zero nulls.
Example:
```text
predicate: id > 0
file metrics:
lower_bounds[id] = 1
upper_bounds[id] = 2
null_value_counts[id] missing
possible rows:
1
2
NULL
```
The lower bound proves that all non-null values are greater than zero, but
it does not prove that there are no nulls. In this case, strict evaluation
should not conclude that every row matches.
This matters because filtered overwrite/delete uses the strict evaluator to
decide whether a file can be deleted entirely or needs a partial rewrite:
```go
if strict {
localDelete = append(localDelete, df)
} else {
localRewrite = append(localRewrite, df)
}
```
If a nullable file is missing null counts but has bounds, it can be
classified as a whole-file delete even though null rows should survive the
filter.
I’d be happy to work on a regression test/fix if this sounds right.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]