manirajv06 commented on code in PR #13398:
URL: https://github.com/apache/iceberg/pull/13398#discussion_r2188474388
##########
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java:
##########
@@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression
unbound, boolean caseSen
* otherwise.
*/
public boolean eval(ContentFile<?> file) {
- // TODO: detect the case where a column is missing from the file using
file's max field id.
+ if (file.valueCounts() != null) {
+ int maxFieldId = file.valueCounts().keySet().stream().mapToInt(i ->
i).max().orElse(0);
Review Comment:
@Fokko Made changes to link schema id to data file only. It is WIP PR. Since
this link change is touching many places, committed the changes to get your
feedback on the overall direction.
Need to focus on the following:
1. Test coverage
2. Delete Files.
Follow up Question: Can we guarantee that all schema columns would be
present in the data file? Schema ID 1 linked to data files 1 to 100. Schema ID
2 linked to data files 101 to 200. Schema ID 1 has three columns: a, b, c.
Schema ID 2 has 2 columns: d, e. Is it guaranteed that all data files 1 to 100
would have all three columns a, b, c? (or) Could there be situation where data
files 1 to 50 has only two columns a, b just c is a optional column and 51 to
100 has all three columns?
I am assuming all three columns would be there with "null" default value for
that optional column "C". Please confirm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]