[jira] [Commented] (COMPRESS-185) BZip2CompressorInputStream truncates files compressed with pbzip2

Sandeep Khadkekar (JIRA) Fri, 24 Jul 2015 16:11:29 -0700

    [ 
https://issues.apache.org/jira/browse/COMPRESS-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641215#comment-14641215
 ]


Sandeep Khadkekar commented on COMPRESS-185:
--------------------------------------------

I still don't see this working. Am I missing anything here?

We were earlier using bzip2 and switched to use pbzip2 and saw massive 
performance improvement. But our client who is using apache commons 1.9 to 
uncompress it is complaining that they are getting exception while 
uncompressing. Sample uncompression on client side:

import org.apache.commons.compress.archivers.ArchiveException;
import org.apache.commons.compress.archivers.ArchiveStreamFactory;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveInputStream;
import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
import org.apache.commons.io.IOUtils;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.LinkedList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class UncompressTest {

    public static void main(String[] args) throws ArchiveException, IOException 
{
        String regex = ".*\\/application$";
        List<File> decompFiles = new LinkedList<File>();

        TarArchiveInputStream inputStream = (TarArchiveInputStream) new 
ArchiveStreamFactory().createArchiveInputStream("tar", new 
BZip2CompressorInputStream(new 
FileInputStream("/Users/skhadkekar/archive/myarchive.tbz")));

        try {
            TarArchiveEntry entry = null;
            Pattern r = Pattern.compile(regex);
            int index = 0;
            while ((entry = (TarArchiveEntry) inputStream.getNextEntry()) != 
null) {
                final File outputFile = new 
File("/Users/skhadkekar/archive/uncompress/", entry.getName());
                if (entry.isDirectory()) {
                    System.out.println("The direcotry is: " + entry.getName());
                    if (!outputFile.exists()) {
                        if (!outputFile.mkdirs()) {
                            throw new IOException("Couldn't create directory 
for " + outputFile.getAbsolutePath());
                        }
                    }
                } else {
                    Matcher m = r.matcher(entry.getName());
                    System.out.println("The file name in the directory is:" + 
entry.getName());
                    if (m.find()) {
                        ++index;
                        OutputStream outputFileStream = new 
FileOutputStream(outputFile);
                        IOUtils.copy(inputStream, outputFileStream);
                        outputFileStream.close();
                        decompFiles.add(outputFile);
                        if (index >= 1) {
                            break;
                        }
                    }
                }
            }
        } catch(Exception eAny) {
            eAny.printStackTrace();
        } finally {
            inputStream.close();
        }
    }
}

> BZip2CompressorInputStream truncates files compressed with pbzip2
> -----------------------------------------------------------------
>
>                 Key: COMPRESS-185
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-185
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.3
>            Reporter: Karsten Loesing
>             Fix For: 1.4
>
>
> I'm using BZip2CompressorInputStream in Compress 1.3 to decompress a file 
> that was created with pbzip2 1.1.6 (http://compression.ca/pbzip2/).  The 
> stream ends early after 900000 bytes, truncating the rest of the 
> pbzip2-compressed file.  Decompressing the file with bunzip2 or compressing 
> the original file with bzip2 both fix the issue.  I think both pbzip2 and 
> Compress are to blame here: pbzip2 apparently does something non-standard 
> when compressing files, and Compress should handle the non-standard format 
> rather than pretending to be done decompressing.  Another option is that I'm 
> doing something wrong; in that case please let me know! :)
> Here's how the problem can be reproduced:
>  1. Generate a file that's 900000+ bytes large: dd if=/dev/zero of=1mbfile 
> count=1 bs=1M
>  2. Compress with pbzip2: pbzip2 1mbfile
>  3. Decompress with Bunzip2 class below
>  4. Notice how the resulting 1mbfile is 900000 bytes large, not 1M.
> Now compare to using bunzip2/bzip2:
>  - Do the steps above, but instead of 2, compress with bzip2: bzip2 1mbfile
>  - Do the steps above, but instead of 3, decompress with bunzip2: bunzip2 
> 1mbfile.bz2
> import java.io.*;
> import org.apache.commons.compress.compressors.bzip2.*;
> public class Bunzip2 {
>   public static void main(String[] args) throws Exception {
>     File inFile = new File(args[0]);
>     File outFile = new File(args[0].substring(0, args[0].length() - 4));
>     FileInputStream fis = new FileInputStream(inFile);
>     BZip2CompressorInputStream bz2cis =
>         new BZip2CompressorInputStream(fis);
>     BufferedInputStream bis = new BufferedInputStream(bz2cis);
>     BufferedOutputStream bos = new BufferedOutputStream(
>         new FileOutputStream(outFile));
>     int len;
>     byte[] data = new byte[1024];
>     while ((len = bis.read(data, 0, 1024)) >= 0) {
>       bos.write(data, 0, len);
>     }   
>     bos.close();
>     bis.close();
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (COMPRESS-185) BZip2CompressorInputStream truncates files compressed with pbzip2

Reply via email to