Hi,

We have a work item, https://issues.apache.org/jira/browse/MESOS-8064, which 
discusses programmatically decoding .tar, .tar.gz, .zip, and other common file 
compression schemes.

I have an initial implementation for this (rough only), and I wanted to reach 
out to the development community for thoughts/input before proceeding. This is 
a fairly big change for Linux as well as Windows.

My initial implementation makes use of libarchive. I implemented an interface 
in stout with a set of unit tests as well.

Initial implementation: 
https://github.com/jeffaco/mesos/blob/libarchive/3rdparty/stout/include/stout/decompress.hpp
Unit tests: 
https://github.com/jeffaco/mesos/blob/libarchive/3rdparty/stout/tests/decompress_tests.cpp
Commit chain that has relevant changes: 
https://github.com/jeffaco/mesos/commits/libarchive

Basically, I'd like the community to look at a rough initial implementation. 
You can see how the implementation is called with the unit tests. One 
implemented, we'll change current command line usage for utilities (tar, zip, 
etc) to use stout interface to libarchive rather than Linux command line 
utilities. You can read more about libarchive here: https://www.libarchive.org/.

Outstanding issues and questions (to be solved before committing to master):


  1.  Is the API reasonable, more or less?

  2.  Is the name of the API reasonable? Andy suggested namespace archive, so 
you could do something like "archive::extract", but unfortunately libarchive 
uses that with struct archive *. Any better name suggestions?

  3.  I'd like to NOT support a specific format. That is, I'd like to just call 
libarchive and have it determine the file format and handle it "magically" 
(which it does), rather than restrict it to a very specific format and error 
out if the file is not in that format. Right now you can pass any supported 
format (.tar, .tar.gz, .tar.xv, .tar.bz2, zip, etc), as you can see by the unit 
tests. Is there a need to pass a specific format limiter, say, FORMAT_ZIP, and 
error if it's not a ZIP file?

Given that libarchive supports multiple compression formats simultaneously 
(i.e. foo.tar.bz2.gz), this could be overly restrictive anyway. But I wanted 
community feedback on this point.

  4.  There's some TODOs to better utilize libarchive APIs to avoid reading 
from stdin. These will be taken care of.

  5.  I'd like to programmatically be able to place extracted bits in a 
specific location. Currently, it goes to the current directory, which will work 
for fetcher cases. This will be investigated to see if we need it, and how 
libarchive can/will support it.

  6.  There's a biggie TODO dealing with long path support and Unicode. This 
will be investigated and taken care of.

So, if the community wouldn't mind taking a look at this and giving me initial 
feedback, I'd be most appreciative! Realize that this is only a rough initial 
implementation; any and all comments would be welcome.

/Jeff

Reply via email to