Hi, We have a work item, https://issues.apache.org/jira/browse/MESOS-8064, which discusses programmatically decoding .tar, .tar.gz, .zip, and other common file compression schemes.
I have an initial implementation for this (rough only), and I wanted to reach out to the development community for thoughts/input before proceeding. This is a fairly big change for Linux as well as Windows. My initial implementation makes use of libarchive. I implemented an interface in stout with a set of unit tests as well. Initial implementation: https://github.com/jeffaco/mesos/blob/libarchive/3rdparty/stout/include/stout/decompress.hpp Unit tests: https://github.com/jeffaco/mesos/blob/libarchive/3rdparty/stout/tests/decompress_tests.cpp Commit chain that has relevant changes: https://github.com/jeffaco/mesos/commits/libarchive Basically, I'd like the community to look at a rough initial implementation. You can see how the implementation is called with the unit tests. One implemented, we'll change current command line usage for utilities (tar, zip, etc) to use stout interface to libarchive rather than Linux command line utilities. You can read more about libarchive here: https://www.libarchive.org/. Outstanding issues and questions (to be solved before committing to master): 1. Is the API reasonable, more or less? 2. Is the name of the API reasonable? Andy suggested namespace archive, so you could do something like "archive::extract", but unfortunately libarchive uses that with struct archive *. Any better name suggestions? 3. I'd like to NOT support a specific format. That is, I'd like to just call libarchive and have it determine the file format and handle it "magically" (which it does), rather than restrict it to a very specific format and error out if the file is not in that format. Right now you can pass any supported format (.tar, .tar.gz, .tar.xv, .tar.bz2, zip, etc), as you can see by the unit tests. Is there a need to pass a specific format limiter, say, FORMAT_ZIP, and error if it's not a ZIP file? Given that libarchive supports multiple compression formats simultaneously (i.e. foo.tar.bz2.gz), this could be overly restrictive anyway. But I wanted community feedback on this point. 4. There's some TODOs to better utilize libarchive APIs to avoid reading from stdin. These will be taken care of. 5. I'd like to programmatically be able to place extracted bits in a specific location. Currently, it goes to the current directory, which will work for fetcher cases. This will be investigated to see if we need it, and how libarchive can/will support it. 6. There's a biggie TODO dealing with long path support and Unicode. This will be investigated and taken care of. So, if the community wouldn't mind taking a look at this and giving me initial feedback, I'd be most appreciative! Realize that this is only a rough initial implementation; any and all comments would be welcome. /Jeff