[
https://issues.apache.org/jira/browse/COMPRESS-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645577#comment-13645577
]
Stefan Bodewig commented on COMPRESS-224:
-----------------------------------------
> Can I set this as the default setting or will certain bz2 files fail?
we decided to turn it off by default for backwards compatibility. There might
be code out there that relies on reading just the first stream - even if quite
improbable. It shouldn't do any harm to enable reading concatenated streams by
default, we even have this on our list of acceptable breaking changes if/once
we do a 2.0 release.
As for performance - the bzip2 code reads like C in many places, reusing big
pre-allocated arrays to avoid garbage collection pressure. Some loss can be
attributed to Java not having unsigned integral primitive types (neglecting
char right now) and so quite some time is spent converting byte to int or char
and vice versa. I don't rule out the current code could be sped up - quite the
opposite.
> Cannot uncompress very large bzip2 files
> ----------------------------------------
>
> Key: COMPRESS-224
> URL: https://issues.apache.org/jira/browse/COMPRESS-224
> Project: Commons Compress
> Issue Type: Bug
> Affects Versions: 1.5
> Environment: Java 1.7.0_03
> Reporter: Peter Karich
> Priority: Blocker
>
> When extracting big files like
> http://download.geofabrik.de/europe/germany/bayern-latest.osm.bz2
> apache-compress works nicely. But when trying the same for e.g.
> http://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/planet/planet-latest.osm.bz2
> it stops without an error after exactly 900000 bits.
> I'm using the following code:
> {code:title=App.java|borderStyle=solid}
> public static void main(String[] args) throws IOException {
> if (args.length == 0)
> throw new IllegalArgumentException("You need to specify the bz2
> file!");
> String fromFile = args[0];
> if (!fromFile.endsWith(".bz2"))
> throw new IllegalArgumentException("You need to specify a bz2
> file! But was:" + fromFile);
> String toFile = pruneFileEnd(fromFile);
> FileInputStream in = new FileInputStream(fromFile);
> FileOutputStream out = new FileOutputStream(toFile);
> BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
> try {
> final byte[] buffer = new byte[1024 * 8];
> int n = 0;
> while (-1 != (n = bzIn.read(buffer))) {
> out.write(buffer, 0, n);
> }
> } finally {
> out.close();
> bzIn.close();
> }
> }
> public static String pruneFileEnd(String file) {
> int index = file.lastIndexOf(".");
> if (index < 0)
> return file;
> return file.substring(0, index);
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira