HeartSaVioR edited a comment on issue #1286: URL: https://github.com/apache/iceberg/issues/1286#issuecomment-671884656
So I added debug log messages - not fully understanding the details, so I roughly added time check for entry point, as well as log message on opening file for read or write. https://github.com/apache/iceberg/commit/5a0a131a76c98d417c8d6cb70947219010171420 Here's the log I got with making the rewrite data action retried and finally failed. https://drive.google.com/file/d/1BEgSY2xbYMgmQBgL7SYCLe2pW-gkOPq3/view?usp=sharing So you're right that manifest files are not read per retrial - so the first trial and further trials are significant difference on elapsed time. Some of output manifest files (looks to be 100+) in first trial seem to be read in further trials (no cache seems to play here), which still make further trials to be around 20 seconds. (Note that it's from local filesystem and I expect higher latency in practice.) Btw, my suggestion was focused on the characteristic of "fast append". If I understand correctly, fast append only adds manifest and data files and doesn't play with existing manifest and data files, and that's how the commit phase can be done in hundreds of milliseconds. The situation what we encountered is that the rewrite data action got conflict with other commits and needs to retry, and all the commits in the meanwhile were fast append. Given the fact commits between retrial should not be so many, does it cost more if we look into which manifest files are added instead when added commits were all fast append? They shouldn't make conflict with the changes rewrite data action has made, based on the characteristic of "fast append", so adding them to manifest-list seems to be OK (not familiar with the spec of files so I may probably imagine the unrealistic one, so please correct me if I'm going wrong.) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
