jackye1995 commented on issue #2723:
URL: https://github.com/apache/iceberg/issues/2723#issuecomment-867132431


   Adding responses from mailing list:
   
   @flyrain 
   For 3, it may not be worth adding extra complexity by introducing a "change 
set", unless we get solid data that shows writing a "change set" is faster than 
a complete rewrite. 
   
   @jackye1995 
   I think the summary looks mostly correct, where we already can do "low 
latency" append and delete by primary key, or delete with any predicate that 
resolves to finitely many equality constraints without the need to scan all 
rows.
   
   Secondary index would be useful for scan, which is subsequently used by the 
generation of delete files for complex delete predicates. This is still in 
design, we should probably push progress in this domain a bit more because I 
haven't heard any update since a few months ago:
   - 
https://docs.google.com/document/d/1E1ofBQoKRnX04bWT3utgyHQGaHZoelgXosk_UNsTUuQ
   - 
https://docs.google.com/document/d/11o3T7XQVITY_5F9Vbri9lF9oJjDZKjHIso7K8tEaFfY
   
   For metadata, I agree with Yufei that a change set approach sounds like 
quite a lot of extra complexity just to keep the commit phase a little bit 
faster. I understand that your single commit time is in the end bounded by the 
time you rewrite that file, but this almost sounds like we want to do 
merge-on-read even for metadata, which would be cool but likely an overkill for 
most users. In my mind, the metadata layer should be managed by the admin 
constantly, and it is much simpler to just periodically optimize the number of 
manifests in order to control the size of the manifest list file during each 
commit, which would also benefit scan planning performance at the same time. I 
am curious about your "low latency" requirement, do you have any actual numbers 
you need to hit, if you could share them here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to