insideo commented on pull request #2513: URL: https://github.com/apache/hadoop/pull/2513#issuecomment-744754256
> When you say the tool is streaming, what exactly do you mean? I asked you this before and I thought you said that it would start converting the layers as they came in instead of waiting for them to be fully downloaded. But looking at the log it seems like there is a download stage, a conversion stage, and then an upload stage and those stages are sequential The current implementation of the CLI tool is not stream-oriented, but the underlying squashfs code definitely is. The filesystem tree and content are built up dynamically as the tar.gz file is read. To do otherwise would require unpacking the tar.gz file into a temporary location, which was explicitly avoided in the design to minimize unnecessary I/O and avoid issues of UID/GID/timestamp changes in the process. > Also, I just realized that I am using squashfs-tools 4.3, which doesn't have reproducible builds on. So it's a slightly fair comparison since 4.4 slows things down by removing some (all?) of the multithreaded-ness of mksquashfs. I will retest with squashfs-tools 4.4 with reproducible builds enabled. This would be an interesting comparison for sure. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
