Hi Everyone, I am writing this to let all of you know about the proposal https://github.com/apache/iceberg/issues/13855 being discussed about the lack of column written information in file and to further discuss on this topic to reach a conclusion.
Problem Statement & Background: Initiated a proposal with the idea of linking schema id with file so that columns written can be extracted using this link. But then, it turns out that this link alone won't help us in finding the columns as columns could be optional at times. Ideally, we want to know all columns written in each file to know whether the specific one or more columns has been used or not in that specific file or not later as part of the evaluation process to decide whether to skip the file or not. "Columns written" metric could be added as a metric to contain this information. It could have all field ids used in the file. Using this, the decision to skip the file or not based on the field id used in Expression can be made by doing a lookup on the "Columns written" set. Please share your thoughts on this proposal. Thanks, Mani