[GitHub] [iceberg] rdblue commented on pull request #3213: Flink: auto compact small files and expire snapshots

GitBox Tue, 05 Oct 2021 08:28:41 -0700


rdblue commented on pull request #3213:
URL: https://github.com/apache/iceberg/pull/3213#issuecomment-934513889



   @fapaul, Iceberg has its own file writers because we track more information 
for each column. Schema evolution is done using field IDs rather than names so 
that we can make all schema evolution metadata operations that are side-effect 
free. We also track additional metadata so that we can prune more unnecessary 
files at jobs planning time.
   
   As for the design here, I think that we have a good plan but no one has 
picked up the implementation of it yet. We plan to send the files that were 
committed from the committer to a parallel set of tasks to rewrite them, keyed 
by partition. When there is enough file content, the compactor tasks will 
release the new data file and the set of files to replace to a compaction 
committer, which will commit the compactions it receives in a given checkpoint. 
You could do the same with other formats, I guess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #3213: Flink: auto compact small files and expire snapshots

Reply via email to