[ 
https://issues.apache.org/jira/browse/COMMONSSITE-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cosmin Carabet updated COMMONSSITE-169:
---------------------------------------
    Description: 
Something in 
[https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
 seems to make iterating through the tar entries of multiple 
TarArchiveInputStreams throw Corrupted TAR archive:
 
{code:java}
@Test
void bla() {
    ExecutorService executorService = Executors.newFixedThreadPool(10);
    List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
            .mapToObj(_idx -> CompletableFuture.runAsync(
                    () -> {
                        try (InputStream inputStream = this.getClass()
                                        .getResourceAsStream(
                                                "/<your favourite tar tgz>");
                                TarArchiveInputStream tarInputStream =
                                        new TarArchiveInputStream(new 
GZIPInputStream(inputStream))) {
                            TarArchiveEntry tarEntry;
                            while ((tarEntry = 
tarInputStream.getNextTarEntry()) != null) {
                                System.out.println("Reading entry %s with size 
%d"
                                        .formatted(tarEntry.getName(), 
tarEntry.getSize()));
                            }
                        } catch (Exception ex) {
                            throw new SafeRuntimeException(ex);
                        }
                    },
                    executorService))
            .toList();
    Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new 
CompletableFuture<?>[0])));
} {code}
Although TarArchiveInputStream is marked as not thread safe, I am not reusing 
objects here. Those are in fact separate objects, presumably all with their own 
position tracking info.
 
The stacktrace here looks like:
{code:java}
Caused by: java.io.IOException: Corrupted TAR archive.
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
    at
Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 
'dddddddddddd' len=12
    at 
org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
    at 
org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
    ... 7 more
 {code}
That code shows that occasionally the header is wrong (the tar entry name 
contains gibberish bits) which makes me think that `getNextTarEntry()` can be 
faulty.
 
Running that code with commons compress 1.25.0 works as expected. So it's 
probably something added since November. Note that this is something related to 
parallelism - using an executor service with a single thread doesn't suffer 
from the same error. The tgz to decompress doesn't really matter - you can use 
a manually created one worth a few KBs.

  was:
Something in 
[https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
 seems to make iterating through the tar entries of multiple 
TarArchiveInputStreams throw Corrupted TAR archive:
 
{code:java}
@Test
void bla() {
    ExecutorService executorService = Executors.newFixedThreadPool(10);
    List<CompletableFuture<Void>> verificationTasks = IntStream.range(0, 200)
            .mapToObj(_idx -> CompletableFuture.runAsync(
                    () -> {
                        try (InputStream inputStream = this.getClass()
                                        .getResourceAsStream(
                                                "/<your favourite tar tgz>");
                                TarArchiveInputStream tarInputStream =
                                        new TarArchiveInputStream(new 
GZIPInputStream(inputStream))) {
                            TarArchiveEntry tarEntry;
                            while ((tarEntry = 
tarInputStream.getNextTarEntry()) != null) {
                                System.out.println("Reading entry %s with size 
%d"
                                        .formatted(tarEntry.getName(), 
tarEntry.getSize()));
                            }
                        } catch (Exception ex) {
                            throw new SafeRuntimeException(ex);
                        }
                    },
                    executorService))
            .toList();
    Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new 
CompletableFuture<?>[0])));
} {code}
Although TarArchiveInputStream is marked as not thread safe, I am not reusing 
objects here. Those are in fact separate objects, presumably all with their own 
position tracking info.
 
The stacktrace here looks like:
{code:java}
Caused by: java.io.IOException: Corrupted TAR archive.
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
    at
Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 in 
'dddddddddddd' len=12
    at 
org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
    at 
org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
    at 
org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
    ... 7 more
 {code}
That code shows that occasionally the header is wrong (the tar entry name 
contains gibberish bits) which makes me think that `getNextTarEntry()` can be 
faulty.
 
Running that code with commons compress 1.25.0 works as expected. So it's 
probably something added since November. Note that this is something related to 
parallelism - using an executor service with a single thread doesn't suffer 
from the same error. The tgz to decompress doesn't really matter - you can use 
a manually created one worth a few KBs.


> Commons compress 1.26.0 gives unexpected Corrupted TAR archive
> --------------------------------------------------------------
>
>                 Key: COMMONSSITE-169
>                 URL: https://issues.apache.org/jira/browse/COMMONSSITE-169
>             Project: Apache Commons All
>          Issue Type: Bug
>         Environment: Commons compress 1.26.0 to get a failure. Any tar tgz.
>            Reporter: Cosmin Carabet
>            Priority: Major
>
> Something in 
> [https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
>  seems to make iterating through the tar entries of multiple 
> TarArchiveInputStreams throw Corrupted TAR archive:
>  
> {code:java}
> @Test
> void bla() {
>     ExecutorService executorService = Executors.newFixedThreadPool(10);
>     List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
>             .mapToObj(_idx -> CompletableFuture.runAsync(
>                     () -> {
>                         try (InputStream inputStream = this.getClass()
>                                         .getResourceAsStream(
>                                                 "/<your favourite tar tgz>");
>                                 TarArchiveInputStream tarInputStream =
>                                         new TarArchiveInputStream(new 
> GZIPInputStream(inputStream))) {
>                             TarArchiveEntry tarEntry;
>                             while ((tarEntry = 
> tarInputStream.getNextTarEntry()) != null) {
>                                 System.out.println("Reading entry %s with 
> size %d"
>                                         .formatted(tarEntry.getName(), 
> tarEntry.getSize()));
>                             }
>                         } catch (Exception ex) {
>                             throw new SafeRuntimeException(ex);
>                         }
>                     },
>                     executorService))
>             .toList();
>     
> Futures.getUnchecked(CompletableFuture.allOf(verificationTasks.toArray(new 
> CompletableFuture<?>[0])));
> } {code}
> Although TarArchiveInputStream is marked as not thread safe, I am not reusing 
> objects here. Those are in fact separate objects, presumably all with their 
> own position tracking info.
>  
> The stacktrace here looks like:
> {code:java}
> Caused by: java.io.IOException: Corrupted TAR archive.
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
>     at
> Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 
> in 'dddddddddddd' len=12
>     at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
>     at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
>     ... 7 more
>  {code}
> That code shows that occasionally the header is wrong (the tar entry name 
> contains gibberish bits) which makes me think that `getNextTarEntry()` can be 
> faulty.
>  
> Running that code with commons compress 1.25.0 works as expected. So it's 
> probably something added since November. Note that this is something related 
> to parallelism - using an executor service with a single thread doesn't 
> suffer from the same error. The tgz to decompress doesn't really matter - you 
> can use a manually created one worth a few KBs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to