Cosmin Carabet created COMMONSSITE-169:
------------------------------------------

             Summary: Commons compress 1.26.0 gives unexpected Corrupted TAR 
archive
                 Key: COMMONSSITE-169
                 URL: https://issues.apache.org/jira/browse/COMMONSSITE-169
             Project: Apache Commons All
          Issue Type: Bug
         Environment: Commons compress 1.26.0 to get a failure. Any tar tgz.
            Reporter: Cosmin Carabet


Something in 
[https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
 seems to make iterating through the tar entries of multiple 
TarArchiveInputStreams throw Corrupted TAR archive:
 
{code:java}
@Test
void bla() {
    ExecutorService executorService = Executors.newFixedThreadPool(10);
    List<CompletableFuture<Void>> verificationTasks = IntStream.range(0, 200)
            .mapToObj(_idx -> CompletableFuture.runAsync(
                    () -> {
                        try (InputStream inputStream = this.getClass()
                                        .getResourceAsStream(
                                                "/<your favourite tar tgz>");
                                TarArchiveInputStream tarInputStream =
                                        new TarArchiveInputStream(new 
GZIPInputStream(inputStream))) {
                            TarArchiveEntry tarEntry;
                            while ((tarEntry = 
tarInputStream.getNextTarEntry()) != null) {
                                System.out.println("Reading entry %s with size 
%d"
                                        .formatted(tarEntry.getName(), 
tarEntry.getSize()));
                            }
                        } catch (Exception ex) {
                            throw new SafeRuntimeException(ex);
                        }
                    },
                    executorService))
            .toList();
    Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new 
CompletableFuture<?>[0])));
} {code}
Although TarArchiveInputStream is marked as not thread safe, I am not reusing 
objects here. Those are in fact separate objects, presumably all with their own 
position tracking info.
 
The stacktrace here looks like:
{code:java}
Caused by: java.io.IOException: Corrupted TAR archive.
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
    at
Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 
'dddddddddddd' len=12
    at 
org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
    at 
org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
    ... 7 more
 {code}
That code shows that occasionally the header is wrong (the tar entry name 
contains gibberish bits) which makes me think that `getNextTarEntry()` can be 
faulty.
 
Running that code with commons compress 1.25.0 works as expected. So it's 
probably something added since November. Note that this is something related to 
parallelism - using an executor service with a single thread doesn't suffer 
from the same error. The tgz to decompress doesn't really matter - you can use 
a manually created one worth a few KBs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to