rdblue commented on pull request #3213: URL: https://github.com/apache/iceberg/pull/3213#issuecomment-934513889
@fapaul, Iceberg has its own file writers because we track more information for each column. Schema evolution is done using field IDs rather than names so that we can make all schema evolution metadata operations that are side-effect free. We also track additional metadata so that we can prune more unnecessary files at jobs planning time. As for the design here, I think that we have a good plan but no one has picked up the implementation of it yet. We plan to send the files that were committed from the committer to a parallel set of tasks to rewrite them, keyed by partition. When there is enough file content, the compactor tasks will release the new data file and the set of files to replace to a compaction committer, which will commit the compactions it receives in a given checkpoint. You could do the same with other formats, I guess. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
