Having both decompress.hpp and gzip.hpp <https://github.com/apache/mesos/blob/master/3rdparty/stout/include/stout/gzip.hpp> seems unfortunate, can you look into unifying them?
The latter is an existing library that provides the following: -In-memory gzip compression and de-compression -In-memory streaming gzip de-compression via 'Decompressor' (adding a 'Compressor' is a TODO) We use this in order to support (de-)compression of HTTP requests/responses in libprocess' http server. It would be great if we could support more compression formats for this use case via libarchive. On Fri, Mar 2, 2018 at 8:26 AM Jie Yu <yujie....@gmail.com> wrote: > Jeff, I looked a brief look, this looks great! > > 1. Is the API reasonable, more or less? > > > Yeah, it looks reasonable to me. > > 2. Is the name of the API reasonable? Andy suggested namespace archive, > > so you could do something like "archive::extract", but unfortunately > > libarchive uses that with struct archive *. Any better name suggestions? > > > Assuming we'll be adding "compress" functionality in the future, any name > that can allow both would be great. > > 3. I'd like to NOT support a specific format. > > > SGTM! > > So, if the community wouldn't mind taking a look at this and giving me > > initial feedback, I'd be most appreciative! Realize that this is only a > > rough initial implementation; any and all comments would be welcome. > > > Looks good. Looks like libarchive is in BSD license? If that's the case, we > can bundle it in the repo (and also provide an option allowing using > pre-installed version). > > Thanks for doing that! Very useful thing for the project! > > - Jie > > On Wed, Feb 28, 2018 at 11:37 AM, Jeff Coffler < > jeff.coff...@microsoft.com.invalid> wrote: > > > Hi, > > > > We have a work item, https://issues.apache.org/jira/browse/MESOS-8064, > > which discusses programmatically decoding .tar, .tar.gz, .zip, and other > > common file compression schemes. > > > > I have an initial implementation for this (rough only), and I wanted to > > reach out to the development community for thoughts/input before > > proceeding. This is a fairly big change for Linux as well as Windows. > > > > My initial implementation makes use of libarchive. I implemented an > > interface in stout with a set of unit tests as well. > > > > Initial implementation: https://github.com/jeffaco/ > mesos/blob/libarchive/ > > 3rdparty/stout/include/stout/decompress.hpp > > Unit tests: https://github.com/jeffaco/mesos/blob/libarchive/ > > 3rdparty/stout/tests/decompress_tests.cpp > > Commit chain that has relevant changes: https://github.com/jeffaco/ > > mesos/commits/libarchive > > > > Basically, I'd like the community to look at a rough initial > > implementation. You can see how the implementation is called with the > unit > > tests. One implemented, we'll change current command line usage for > > utilities (tar, zip, etc) to use stout interface to libarchive rather > than > > Linux command line utilities. You can read more about libarchive here: > > https://www.libarchive.org/. > > > > Outstanding issues and questions (to be solved before committing to > > master): > > > > > > 1. Is the API reasonable, more or less? > > > > 2. Is the name of the API reasonable? Andy suggested namespace > archive, > > so you could do something like "archive::extract", but unfortunately > > libarchive uses that with struct archive *. Any better name suggestions? > > > > 3. I'd like to NOT support a specific format. That is, I'd like to > just > > call libarchive and have it determine the file format and handle it > > "magically" (which it does), rather than restrict it to a very specific > > format and error out if the file is not in that format. Right now you can > > pass any supported format (.tar, .tar.gz, .tar.xv, .tar.bz2, zip, etc), > as > > you can see by the unit tests. Is there a need to pass a specific format > > limiter, say, FORMAT_ZIP, and error if it's not a ZIP file? > > > > Given that libarchive supports multiple compression formats > simultaneously > > (i.e. foo.tar.bz2.gz), this could be overly restrictive anyway. But I > > wanted community feedback on this point. > > > > 4. There's some TODOs to better utilize libarchive APIs to avoid > > reading from stdin. These will be taken care of. > > > > 5. I'd like to programmatically be able to place extracted bits in a > > specific location. Currently, it goes to the current directory, which > will > > work for fetcher cases. This will be investigated to see if we need it, > and > > how libarchive can/will support it. > > > > 6. There's a biggie TODO dealing with long path support and Unicode. > > This will be investigated and taken care of. > > > > So, if the community wouldn't mind taking a look at this and giving me > > initial feedback, I'd be most appreciative! Realize that this is only a > > rough initial implementation; any and all comments would be welcome. > > > > /Jeff > > >