[ 
https://issues.apache.org/jira/browse/COMPRESS-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645577#comment-13645577
 ] 

Stefan Bodewig commented on COMPRESS-224:
-----------------------------------------

> Can I set this as the default setting or will certain bz2 files fail?

we decided to turn it off by default for backwards compatibility.  There might 
be code out there that relies on reading just the first stream - even if quite 
improbable.  It shouldn't do any harm to enable reading concatenated streams by 
default, we even have this on our list of acceptable breaking changes if/once 
we do a 2.0 release.

As for performance - the bzip2 code reads like C in many places, reusing big 
pre-allocated arrays to avoid garbage collection pressure.  Some loss can be 
attributed to Java not having unsigned integral primitive types (neglecting 
char right now) and so quite some time is spent converting byte to int or char 
and vice versa.  I don't rule out the current code could be sped up - quite the 
opposite.
                
> Cannot uncompress very large bzip2 files
> ----------------------------------------
>
>                 Key: COMPRESS-224
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-224
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.5
>         Environment: Java 1.7.0_03
>            Reporter: Peter Karich
>            Priority: Blocker
>
> When extracting big files like 
> http://download.geofabrik.de/europe/germany/bayern-latest.osm.bz2 
> apache-compress works nicely. But when trying the same for e.g. 
> http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/planet/planet-latest.osm.bz2
>  it stops without an error after exactly 900000 bits.
> I'm using the following code:
> {code:title=App.java|borderStyle=solid}
>  public static void main(String[] args) throws IOException {
>         if (args.length == 0)
>             throw new IllegalArgumentException("You need to specify the bz2 
> file!");
>         String fromFile = args[0];
>         if (!fromFile.endsWith(".bz2"))
>             throw new IllegalArgumentException("You need to specify a bz2 
> file! But was:" + fromFile);
>         String toFile = pruneFileEnd(fromFile);
>         FileInputStream in = new FileInputStream(fromFile);
>         FileOutputStream out = new FileOutputStream(toFile);
>         BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
>         try {
>             final byte[] buffer = new byte[1024 * 8];
>             int n = 0;
>             while (-1 != (n = bzIn.read(buffer))) {
>                 out.write(buffer, 0, n);
>             }
>         } finally {
>             out.close();
>             bzIn.close();
>         }
>     }
>     public static String pruneFileEnd(String file) {
>         int index = file.lastIndexOf(".");
>         if (index < 0)
>             return file;
>         return file.substring(0, index);
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to