TAR extraction fails with FileNotFoundException (directories not being created)
-------------------------------------------------------------------------------

                 Key: SANDBOX-168
                 URL: http://issues.apache.org/jira/browse/SANDBOX-168
             Project: Commons Sandbox
          Issue Type: Bug
          Components: Compress
    Affects Versions: Nightly Builds
         Environment: Probably irrelevant, but am using JDK 1.5.0_07 on a win 
xp sp2 box.
            Reporter: Sam Smith


--------------------------------------------------
Summary
--------------------------------------------------

I am able to create TAR archive files using the org.apache.commons.compress 
code, however, when I go to extract the contents of TAR archive using that same 
code, it fails.

I think that there must be a bug with org.apache.commons.compress because can 
use the program 7-zip to successfully extract the contents of the archive.


--------------------------------------------------
Background
--------------------------------------------------

I need Java TAR support for archiving purposes; see this forum thread if you 
want to know why:
        http://forum.java.sun.com/thread.jspa?threadID=757876

The com.ice.tar library
        http://www.gjt.org/pkgdoc/com/ice/tar/index.html
proved inadequate because it does not support long paths reliably (the GNU TAR 
extensions are essential).

So, I am turning to this apache code, which does handle long paths and seems to 
be actively maintained.


--------------------------------------------------
Details of how the TAR archive was created
--------------------------------------------------

Because there appears to be no stable release for the 
org.apache.commons.compress code, I just grabbed the latest nightly build, 
commons-compress-20060814.  MAYBE THIS IS THE PROBLEM: if this is a known bad 
build and there is a better one, by all means please let me know and what build 
to use.  Also, somehow this info should be put as a comment for each nightly 
build.

Assuming that the above is not the case, and that this is a new bug, here is 
how I stumbled across it.

First, I construct a new TAR archive with code that ultimately boils down to 
this:
                String path = fileParent.getRelativePath(file); // Note: 
getRelativePath will ensure that directories end with a separator
                if (File.separatorChar != '/') path = 
path.replace(File.separatorChar, '/');    // CRITICAL: handles bizarre systems 
like windoze which use other chars than / for directory separation; the TAR 
format requires / to be used
                
                TarEntry entry = new TarEntry( file );
                entry.setName( path );
                out.putNextEntry( entry );
                writeFileData(file, out);
                out.closeEntry();
                
                if ( file.isDirectory() ) {
                        for (File fileChild : DirUtil.getContents(file, null)) 
{        // supply null, since we test at beginning of this method (supplying 
filter here which just add a redundant test)
                                archive( fileChild, fileParent, out, filter );
                        }
                }

Note that FileParent is my own class that I originally wrote for a ZIP 
archiver.  This class keeps track of the root directory that is being TARed 
because I want all of my paths to be stored as relative offsets from this root; 
I do NOT want any path elements above that root directory to be included.  The 
apache TarEntry class appears to me to include a lot of extraneous path 
elements (albeit it will strip off drive letters or an initial '/' char).

In addition to controlling the paths, I also need to use low level classes like 
TarOutputStream to force the use of GNU long paths via a call like
        tarOutputStream.setLongFileMode(TarOutputStream.LONGFILE_GNU);

If I were to use the high level Archiver functionality that you document here
        http://wiki.apache.org/jakarta-commons/Compress
(for ZIPs) or
        
http://svn.apache.org/viewvc/jakarta/commons/sandbox/compress/trunk/src/examples/org/apache/commons/compress/examples/TarExample.java?view=markup
(for TARs), then I would have no such control over relative paths or GNU TAR 
extensions.  There is also an efficient file filtering technique that I do that 
would not be supported if used an Archiver.


--------------------------------------------------
Error when extracting the TAR archive with org.apache.commons.compress
--------------------------------------------------

I think that the archive produced by the above code is legitimate, because I 
can successfully extract it using the program 7-zip.  As proof, I have a 
program called DirectoryComparer which compares 2 directories, notes any paths 
which are not in common, and for common paths examines every normal file 
byte-for-byte to find any discrepancies.  Running that program on the original 
directory and the archived/extracted one found zero differences.

But, when I tried extracting the archive using the org.apache.commons.compress 
code, I got the following error:

Exception in thread "main" org.apache.commons.compress.UnpackException: 
Exception while unpacking.
        at 
org.apache.commons.compress.archivers.tar.TarArchive.doUnpack(TarArchive.java:110)
        at 
org.apache.commons.compress.AbstractArchive.unpack(AbstractArchive.java:122)
        at bb.io.TarUtil.extract(TarUtil.java:558)
        at 
bb.io.TarUtil$Test.test_archive_extract_pathLengthLimit(TarUtil.java:725)
        at bb.io.TarUtil$Test.main(TarUtil.java:598)
Caused by: java.io.FileNotFoundException: F:\longPaths\2B6vLVrp4c (The system 
cannot find the path specified)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
        at 
org.apache.commons.compress.archivers.tar.TarArchive.doUnpack(TarArchive.java:97)
        ... 4 more


--------------------------------------------------
Details of how the TAR archive was extracted
--------------------------------------------------

The code that I used to do the extraction is
                TarArchive archive = null;
                try {
                        Archive archiver = ArchiverFactory.getInstance(tarFile);
                        archiver.unpack(directoryToExtractInto);
                }
                finally {
                        close(archive);
                }
Here, unlike archiving, I went ahead and used the convenient Archiver 
functionality because no low level control was needed.

Also, the original target directory being archived is named longPaths and, as 
its name indicates, it has all kinds of super long path elements inside it.  (I 
wrote a program to auto generate really long subdirectory structures like this 
for torture testing my archiving programs.)


--------------------------------------------------
Where the bug lies
--------------------------------------------------

I THINK THAT THE PROBLEM WITH THE ORG.APACHE.COMMONS.COMPRESS EXTRACTION CODE 
IS THE FACT THAT IT EXTRACTS DIRECTORIES AS NORMAL FILES.

I say this because there is a normal file left on my filesystem after doing the 
above that is named longPaths.  But longPaths should be a directory; since it 
was actually miscreated by the apache code as a file, then of course the 
subdirectory
        longPaths\2B6vLVrp4c
cannot be created as reported by the stacktrace above.

Again, let me mention that 7-zip did sucessfully completely extract the 
complicated contents of longPaths, correctly recreating all of the 
subdirectories etc, so I do not suspect that my code for creating the TAR 
archive is wrong.

Furthermore, when I tried abandoning the above TAR creation code and used your 
Archiver technique with code like
        Archive archiver = ArchiverFactory.getInstance("tar");
        for (File file : files) {
                archive(file, archiver, filter);
        }
        archiver.save(tarFile);

                // this is the relevant code snippet from the archive method:
        archiver.add( file );
        
        if ( file.isDirectory() ) {
                for (File fileChild : DirUtil.getContents(file, null)) {
                        archive( fileChild, archiver, filter );
                }
        }
then I still get an error:

Exception in thread "main" java.io.FileNotFoundException: Z:\longPaths (Access 
is denied)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at 
org.apache.commons.compress.AbstractArchive.add(AbstractArchive.java:90)
        at bb.io.TarUtil.archive(TarUtil.java:412)
        at bb.io.TarUtil.archive(TarUtil.java:339)
        at 
bb.io.TarUtil$Test.test_archive_extract_pathLengthLimit(TarUtil.java:711)
        at bb.io.TarUtil$Test.main(TarUtil.java:594)


--------------------------------------------------
Misc issues
--------------------------------------------------

1) I am sorry if this is a known issue that has been beaten to death on the 
mailing list.  But I am a newcomer, and I was unable to figure out how to 
search the mailing list archives!

Clicking on the "Search the mailing list archive" link on
        http://jakarta.apache.org/commons/sandbox/compress/issue-tracking.html
brought me to
        http://mail-archives.apache.org/mod_mbox/jakarta-commons-dev/
which only seems to offer manual browsing, which is a tedious and inefficient 
way to find issues with the compress code, especially as the mailing list seems 
to discuss every commons project.

Is there a better way?


2) there seem to be redundant TAR packages:
        older one?:
                
http://svn.apache.org/viewvc/jakarta/commons/sandbox/compress/trunk/src/java/org/apache/commons/compress/tar/
        newer one?:
                
http://svn.apache.org/viewvc/jakarta/commons/sandbox/compress/trunk/src/java/org/apache/commons/compress/archivers/tar/
Which one am I supposed to use?


3) GNU tar apparently supports unlimited path lengths, but what about file 
sizes?  Traditional TAR only support files up to 8 GB in size.  Does the 
org.apache.commons.compress TAR code have any file size limits?  Please add 
documentation about this.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to