manirajv06 commented on code in PR #13398:
URL: https://github.com/apache/iceberg/pull/13398#discussion_r2183470286
##########
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java:
##########
@@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression
unbound, boolean caseSen
* otherwise.
*/
public boolean eval(ContentFile<?> file) {
- // TODO: detect the case where a column is missing from the file using
file's max field id.
+ if (file.valueCounts() != null) {
+ int maxFieldId = file.valueCounts().keySet().stream().mapToInt(i ->
i).max().orElse(0);
Review Comment:
Schema might have 10 columns but the file could have 5 columns (say, field
id from 1 to 5). So, we need to set 5 as max field id on that specific file.
Then only, we can able to use this max field id to skip the files who's max
field id is lesser than id of the column come through `Expression` to
*MetricsEvaluator. Is my understanding correct?
After writing the file using `DataWriter`, list of columns could be fetched
from getFileMetaData().getSchema() and max value from the same can be set on
`DataFile` or `ContentFile`. Which, in turn, could be used to skip the file as
described earlier.
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]