RussellSpitzer commented on issue #13855: URL: https://github.com/apache/iceberg/issues/13855#issuecomment-3201518962
I probably would not link schema ID because that alone would not indicate the presence of a field (Optional Fields) but we probably should have some way in the metrics of determining the difference between a "missing metric" and a missing field. Maybe we should be storing "columns written" as a metric for file so that the planner can avoid a file. This would require adding a set to each datafile entry but it should compress very well since it should be mostly identical between all data files in a manifest. If the goal is determining which files have which columns we should probably start by with that problem statement before jumping to linking to schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org