ericbadger commented on pull request #2513: URL: https://github.com/apache/hadoop/pull/2513#issuecomment-743467026
I've been able to run the tool locally and it seems to work as designed. At least from my initial testing. However, I found that the tool runs quite a bit slower than the docker-to-squash python script. Notably, I ran both this tool and the docker-to-squash tool on a fairly large image (14.8 GB, 32 layers) and it took 38:13 to run on this tool while taking 21:50 to run on docker-to-squash. I'm currently trying to figure out where the differences are that make this tool take so much longer. My first thought is that this tool appears to download layers sequentially, while the docker-to-squash tool does them in parallel (since it uses docker pull). The next step is converting the layers, which is the internal implementation vs mksquashfs. It's possible that mkquashfs is just faster there. I'll need to do more analysis. And then the last step is the layer upload. I know that this tool is uploading both the sqsh image as well as the tgz file. So that's about double the work, which makes sense why it would take longer. Anyway, I'm still looking into the performance, but @insideo , feel free to post your insights. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
