Re: Thoughts on initial implementation to use libarchive?

Benjamin Mahler Fri, 02 Mar 2018 13:59:15 -0800

Having both decompress.hpp and gzip.hpp
<https://github.com/apache/mesos/blob/master/3rdparty/stout/include/stout/gzip.hpp>
seems
unfortunate, can you look into unifying them?


The latter is an existing library that provides the following:
  -In-memory gzip compression and de-compression
  -In-memory streaming gzip de-compression via 'Decompressor' (adding a
'Compressor' is a TODO)

We use this in order to support (de-)compression of HTTP requests/responses
in libprocess' http server. It would be great if we could support more
compression formats for this use case via libarchive.

On Fri, Mar 2, 2018 at 8:26 AM Jie Yu <yujie....@gmail.com> wrote:

> Jeff, I looked a brief look, this looks great!
>
> 1.  Is the API reasonable, more or less?
>
>
> Yeah, it looks reasonable to me.
>
>  2.  Is the name of the API reasonable? Andy suggested namespace archive,
> > so you could do something like "archive::extract", but unfortunately
> > libarchive uses that with struct archive *. Any better name suggestions?
>
>
> Assuming we'll be adding "compress" functionality in the future, any name
> that can allow both would be great.
>
>  3.  I'd like to NOT support a specific format.
>
>
> SGTM!
>
> So, if the community wouldn't mind taking a look at this and giving me
> > initial feedback, I'd be most appreciative! Realize that this is only a
> > rough initial implementation; any and all comments would be welcome.
>
>
> Looks good. Looks like libarchive is in BSD license? If that's the case, we
> can bundle it in the repo (and also provide an option allowing using
> pre-installed version).
>
> Thanks for doing that! Very useful thing for the project!
>
> - Jie
>
> On Wed, Feb 28, 2018 at 11:37 AM, Jeff Coffler <
> jeff.coff...@microsoft.com.invalid> wrote:
>
> > Hi,
> >
> > We have a work item, https://issues.apache.org/jira/browse/MESOS-8064,
> > which discusses programmatically decoding .tar, .tar.gz, .zip, and other
> > common file compression schemes.
> >
> > I have an initial implementation for this (rough only), and I wanted to
> > reach out to the development community for thoughts/input before
> > proceeding. This is a fairly big change for Linux as well as Windows.
> >
> > My initial implementation makes use of libarchive. I implemented an
> > interface in stout with a set of unit tests as well.
> >
> > Initial implementation: https://github.com/jeffaco/
> mesos/blob/libarchive/
> > 3rdparty/stout/include/stout/decompress.hpp
> > Unit tests: https://github.com/jeffaco/mesos/blob/libarchive/
> > 3rdparty/stout/tests/decompress_tests.cpp
> > Commit chain that has relevant changes: https://github.com/jeffaco/
> > mesos/commits/libarchive
> >
> > Basically, I'd like the community to look at a rough initial
> > implementation. You can see how the implementation is called with the
> unit
> > tests. One implemented, we'll change current command line usage for
> > utilities (tar, zip, etc) to use stout interface to libarchive rather
> than
> > Linux command line utilities. You can read more about libarchive here:
> > https://www.libarchive.org/.
> >
> > Outstanding issues and questions (to be solved before committing to
> > master):
> >
> >
> >   1.  Is the API reasonable, more or less?
> >
> >   2.  Is the name of the API reasonable? Andy suggested namespace
> archive,
> > so you could do something like "archive::extract", but unfortunately
> > libarchive uses that with struct archive *. Any better name suggestions?
> >
> >   3.  I'd like to NOT support a specific format. That is, I'd like to
> just
> > call libarchive and have it determine the file format and handle it
> > "magically" (which it does), rather than restrict it to a very specific
> > format and error out if the file is not in that format. Right now you can
> > pass any supported format (.tar, .tar.gz, .tar.xv, .tar.bz2, zip, etc),
> as
> > you can see by the unit tests. Is there a need to pass a specific format
> > limiter, say, FORMAT_ZIP, and error if it's not a ZIP file?
> >
> > Given that libarchive supports multiple compression formats
> simultaneously
> > (i.e. foo.tar.bz2.gz), this could be overly restrictive anyway. But I
> > wanted community feedback on this point.
> >
> >   4.  There's some TODOs to better utilize libarchive APIs to avoid
> > reading from stdin. These will be taken care of.
> >
> >   5.  I'd like to programmatically be able to place extracted bits in a
> > specific location. Currently, it goes to the current directory, which
> will
> > work for fetcher cases. This will be investigated to see if we need it,
> and
> > how libarchive can/will support it.
> >
> >   6.  There's a biggie TODO dealing with long path support and Unicode.
> > This will be investigated and taken care of.
> >
> > So, if the community wouldn't mind taking a look at this and giving me
> > initial feedback, I'd be most appreciative! Realize that this is only a
> > rough initial implementation; any and all comments would be welcome.
> >
> > /Jeff
> >
>

Re: Thoughts on initial implementation to use libarchive?

Reply via email to