HeartSaVioR commented on issue #1286:
URL: https://github.com/apache/iceberg/issues/1286#issuecomment-671884656


   So I added debug log messages - not fully understanding the details, so I 
roughly added time check for entry point, as well as log message on opening 
file for read or write.
   
   
https://github.com/apache/iceberg/commit/5a0a131a76c98d417c8d6cb70947219010171420
   
   Here's the log I got with making the rewrite data action retried and finally 
failed.
   
   
https://drive.google.com/file/d/1BEgSY2xbYMgmQBgL7SYCLe2pW-gkOPq3/view?usp=sharing
   
   So you're right that manifest files are not read per retrial - so the first 
trial and further trials are significant difference on elapsed time. Some of 
output manifest files in first trial seem to be read in further trials (no 
cache seems to play here), which still make further trials to be around 20 
seconds. (Note that it's from local filesystem and I expect higher latency in 
practice.)
   
   Btw, my suggestion was focused on the characteristic of "fast append". If I 
understand correctly, fast append only adds manifest and data files and doesn't 
play with existing manifest and data files, and that's how the commit phase can 
be done in hundreds of milliseconds.
   
   The situation what we encountered is that the rewrite data action got 
conflict with other commits and needs to retry, and all the commits in the 
meanwhile were fast append. Given the fact commits between retrial should not 
be so many, does it cost more if we look into which manifest files are added 
instead when added commits were all fast append? They shouldn't make conflict 
with the changes rewrite data action has made, based on the characteristic of 
"fast append", so adding them to manifest-list seems to be OK (not familiar 
with the spec of files so I may probably imagine the unrealistic one, so please 
correct me if I'm going wrong.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to