[
https://issues.apache.org/jira/browse/COMMONSSITE-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cosmin Carabet updated COMMONSSITE-169:
---------------------------------------
Description:
Something in
[https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
seems to make iterating through the tar entries of multiple
TarArchiveInputStreams throw Corrupted TAR archive:
{code:java}
@Test
void bla() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
.mapToObj(_idx -> CompletableFuture.runAsync(
() -> {
try (InputStream inputStream = this.getClass()
.getResourceAsStream(
"/<your favourite tar tgz>");
TarArchiveInputStream tarInputStream =
new TarArchiveInputStream(new
GZIPInputStream(inputStream))) {
TarArchiveEntry tarEntry;
while ((tarEntry =
tarInputStream.getNextTarEntry()) != null) {
System.out.println("Reading entry %s with size
%d"
.formatted(tarEntry.getName(),
tarEntry.getSize()));
}
} catch (Exception ex) {
throw new SafeRuntimeException(ex);
}
},
executorService))
.toList();
Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new
CompletableFuture<?>[0])));
} {code}
Although TarArchiveInputStream is marked as not thread safe, I am not reusing
objects here. Those are in fact separate objects, presumably all with their own
position tracking info.
The stacktrace here looks like:
{code:java}
Caused by: java.io.IOException: Corrupted TAR archive.
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
at
Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in
'dddddddddddd' len=12
at
org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
at
org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
... 7 more
{code}
That code shows that occasionally the header is wrong (the tar entry name
contains gibberish bits) which makes me think that `getNextTarEntry()` can be
faulty.
Running that code with commons compress 1.25.0 works as expected. So it's
probably something added since November. Note that this is something related to
parallelism - using an executor service with a single thread doesn't suffer
from the same error. The tgz to decompress doesn't really matter - you can use
a manually created one worth a few KBs.
was:
Something in
[https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
seems to make iterating through the tar entries of multiple
TarArchiveInputStreams throw Corrupted TAR archive:
{code:java}
@Test
void bla() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
List<CompletableFuture<Void>> verificationTasks = IntStream.range(0, 200)
.mapToObj(_idx -> CompletableFuture.runAsync(
() -> {
try (InputStream inputStream = this.getClass()
.getResourceAsStream(
"/<your favourite tar tgz>");
TarArchiveInputStream tarInputStream =
new TarArchiveInputStream(new
GZIPInputStream(inputStream))) {
TarArchiveEntry tarEntry;
while ((tarEntry =
tarInputStream.getNextTarEntry()) != null) {
System.out.println("Reading entry %s with size
%d"
.formatted(tarEntry.getName(),
tarEntry.getSize()));
}
} catch (Exception ex) {
throw new SafeRuntimeException(ex);
}
},
executorService))
.toList();
Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new
CompletableFuture<?>[0])));
} {code}
Although TarArchiveInputStream is marked as not thread safe, I am not reusing
objects here. Those are in fact separate objects, presumably all with their own
position tracking info.
The stacktrace here looks like:
{code:java}
Caused by: java.io.IOException: Corrupted TAR archive.
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
at
Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in
'dddddddddddd' len=12
at
org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
at
org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
... 7 more
{code}
That code shows that occasionally the header is wrong (the tar entry name
contains gibberish bits) which makes me think that `getNextTarEntry()` can be
faulty.
Running that code with commons compress 1.25.0 works as expected. So it's
probably something added since November. Note that this is something related to
parallelism - using an executor service with a single thread doesn't suffer
from the same error. The tgz to decompress doesn't really matter - you can use
a manually created one worth a few KBs.
> Commons compress 1.26.0 gives unexpected Corrupted TAR archive
> --------------------------------------------------------------
>
> Key: COMMONSSITE-169
> URL: https://issues.apache.org/jira/browse/COMMONSSITE-169
> Project: Apache Commons All
> Issue Type: Bug
> Environment: Commons compress 1.26.0 to get a failure. Any tar tgz.
> Reporter: Cosmin Carabet
> Priority: Major
>
> Something in
> [https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
> seems to make iterating through the tar entries of multiple
> TarArchiveInputStreams throw Corrupted TAR archive:
>
> {code:java}
> @Test
> void bla() {
> ExecutorService executorService = Executors.newFixedThreadPool(10);
> List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
> .mapToObj(_idx -> CompletableFuture.runAsync(
> () -> {
> try (InputStream inputStream = this.getClass()
> .getResourceAsStream(
> "/<your favourite tar tgz>");
> TarArchiveInputStream tarInputStream =
> new TarArchiveInputStream(new
> GZIPInputStream(inputStream))) {
> TarArchiveEntry tarEntry;
> while ((tarEntry =
> tarInputStream.getNextTarEntry()) != null) {
> System.out.println("Reading entry %s with
> size %d"
> .formatted(tarEntry.getName(),
> tarEntry.getSize()));
> }
> } catch (Exception ex) {
> throw new SafeRuntimeException(ex);
> }
> },
> executorService))
> .toList();
>
> Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new
> CompletableFuture<?>[0])));
> } {code}
> Although TarArchiveInputStream is marked as not thread safe, I am not reusing
> objects here. Those are in fact separate objects, presumably all with their
> own position tracking info.
>
> The stacktrace here looks like:
> {code:java}
> Caused by: java.io.IOException: Corrupted TAR archive.
> at
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
> at
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
> at
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
> at
> Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0
> in 'dddddddddddd' len=12
> at
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
> at
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
> at
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
> at
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
> ... 7 more
> {code}
> That code shows that occasionally the header is wrong (the tar entry name
> contains gibberish bits) which makes me think that `getNextTarEntry()` can be
> faulty.
>
> Running that code with commons compress 1.25.0 works as expected. So it's
> probably something added since November. Note that this is something related
> to parallelism - using an executor service with a single thread doesn't
> suffer from the same error. The tgz to decompress doesn't really matter - you
> can use a manually created one worth a few KBs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)