insideo commented on pull request #2513:
URL: https://github.com/apache/hadoop/pull/2513#issuecomment-744754256


   > When you say the tool is streaming, what exactly do you mean? I asked you 
this before and I thought you said that it would start converting the layers as 
they came in instead of waiting for them to be fully downloaded. But looking at 
the log it seems like there is a download stage, a conversion stage, and then 
an upload stage and those stages are sequential
   
   The current implementation of the CLI tool is not stream-oriented, but the 
underlying squashfs code definitely is. The filesystem tree and content are 
built up dynamically as the tar.gz file is read. To do otherwise would require 
unpacking the tar.gz file into a temporary location, which was explicitly 
avoided in the design to minimize unnecessary I/O and avoid issues of 
UID/GID/timestamp changes in the process.
   
   > Also, I just realized that I am using squashfs-tools 4.3, which doesn't 
have reproducible builds on. So it's a slightly fair comparison since 4.4 slows 
things down by removing some (all?) of the multithreaded-ness of mksquashfs. I 
will retest with squashfs-tools 4.4 with reproducible builds enabled.
   
   This would be an interesting comparison for sure.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to