Cosmin Carabet created COMMONSSITE-169:
------------------------------------------
Summary: Commons compress 1.26.0 gives unexpected Corrupted TAR
archive
Key: COMMONSSITE-169
URL: https://issues.apache.org/jira/browse/COMMONSSITE-169
Project: Apache Commons All
Issue Type: Bug
Environment: Commons compress 1.26.0 to get a failure. Any tar tgz.
Reporter: Cosmin Carabet
Something in
[https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
seems to make iterating through the tar entries of multiple
TarArchiveInputStreams throw Corrupted TAR archive:
{code:java}
@Test
void bla() {
ExecutorService executorService = Executors.newFixedThreadPool(10);
List<CompletableFuture<Void>> verificationTasks = IntStream.range(0, 200)
.mapToObj(_idx -> CompletableFuture.runAsync(
() -> {
try (InputStream inputStream = this.getClass()
.getResourceAsStream(
"/<your favourite tar tgz>");
TarArchiveInputStream tarInputStream =
new TarArchiveInputStream(new
GZIPInputStream(inputStream))) {
TarArchiveEntry tarEntry;
while ((tarEntry =
tarInputStream.getNextTarEntry()) != null) {
System.out.println("Reading entry %s with size
%d"
.formatted(tarEntry.getName(),
tarEntry.getSize()));
}
} catch (Exception ex) {
throw new SafeRuntimeException(ex);
}
},
executorService))
.toList();
Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new
CompletableFuture<?>[0])));
} {code}
Although TarArchiveInputStream is marked as not thread safe, I am not reusing
objects here. Those are in fact separate objects, presumably all with their own
position tracking info.
The stacktrace here looks like:
{code:java}
Caused by: java.io.IOException: Corrupted TAR archive.
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
at
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
at
Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in
'dddddddddddd' len=12
at
org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
at
org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
at
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
... 7 more
{code}
That code shows that occasionally the header is wrong (the tar entry name
contains gibberish bits) which makes me think that `getNextTarEntry()` can be
faulty.
Running that code with commons compress 1.25.0 works as expected. So it's
probably something added since November. Note that this is something related to
parallelism - using an executor service with a single thread doesn't suffer
from the same error. The tgz to decompress doesn't really matter - you can use
a manually created one worth a few KBs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)