Thoughts on initial implementation to use libarchive?

Jeff Coffler Wed, 28 Feb 2018 11:38:11 -0800

Hi,

We have a work item, https://issues.apache.org/jira/browse/MESOS-8064, which 
discusses programmatically decoding .tar, .tar.gz, .zip, and other common file 
compression schemes.

I have an initial implementation for this (rough only), and I wanted to reach
out to the development community for thoughts/input before proceeding. This is
a fairly big change for Linux as well as Windows.

My initial implementation makes use of libarchive. I implemented an interface
in stout with a set of unit tests as well.

Initial implementation:
https://github.com/jeffaco/mesos/blob/libarchive/3rdparty/stout/include/stout/decompress.hpp
Unit tests:
https://github.com/jeffaco/mesos/blob/libarchive/3rdparty/stout/tests/decompress_tests.cpp
Commit chain that has relevant changes:
https://github.com/jeffaco/mesos/commits/libarchive

Basically, I'd like the community to look at a rough initial implementation.
You can see how the implementation is called with the unit tests. One
implemented, we'll change current command line usage for utilities (tar, zip,
etc) to use stout interface to libarchive rather than Linux command line
utilities. You can read more about libarchive here: https://www.libarchive.org/.

Outstanding issues and questions (to be solved before committing to master):

1. Is the API reasonable, more or less?

2. Is the name of the API reasonable? Andy suggested namespace archive, so
you could do something like "archive::extract", but unfortunately libarchive
uses that with struct archive *. Any better name suggestions?

3. I'd like to NOT support a specific format. That is, I'd like to just call
libarchive and have it determine the file format and handle it "magically"
(which it does), rather than restrict it to a very specific format and error
out if the file is not in that format. Right now you can pass any supported
format (.tar, .tar.gz, .tar.xv, .tar.bz2, zip, etc), as you can see by the unit
tests. Is there a need to pass a specific format limiter, say, FORMAT_ZIP, and
error if it's not a ZIP file?

Given that libarchive supports multiple compression formats simultaneously
(i.e. foo.tar.bz2.gz), this could be overly restrictive anyway. But I wanted
community feedback on this point.

4. There's some TODOs to better utilize libarchive APIs to avoid reading
from stdin. These will be taken care of.

5. I'd like to programmatically be able to place extracted bits in a
specific location. Currently, it goes to the current directory, which will work
for fetcher cases. This will be investigated to see if we need it, and how
libarchive can/will support it.

6. There's a biggie TODO dealing with long path support and Unicode. This
will be investigated and taken care of.

So, if the community wouldn't mind taking a look at this and giving me initial
feedback, I'd be most appreciative! Realize that this is only a rough initial
implementation; any and all comments would be welcome.

/Jeff

Thoughts on initial implementation to use libarchive?

Reply via email to