[ 
https://issues.apache.org/jira/browse/COMPRESS-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822644#comment-17822644
 ] 

Cosmin Carabet edited comment on COMPRESS-666 at 3/1/24 5:39 PM:
-----------------------------------------------------------------

I've done a binary search on the master branch for the first commit where my 
test fails. Looks like it's this one 
[https://github.com/apache/commons-compress/commit/856a540d3eeb463527c509853f4d205f4adc272f]
 . If my test is run on the commit just before it, it passes. From that, it 
starts failing.

 

I've just played a bit more on that commit linked above where things starting 
failing. If I revert changes to the following 2 files, my test passes:
{code:java}
git checkout 0c9f16e43 -- 
src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveInputStream.java
git checkout 0c9f16e43 -- 
src/main/java/org/apache/commons/compress/utils/IOUtils.java {code}


was (Author: JIRAUSER304399):
I've done a binary search on the master branch for the first commit where my 
test fails. Looks like it's this one 
[https://github.com/apache/commons-compress/commit/856a540d3eeb463527c509853f4d205f4adc272f]
 . If my test is run on the commit just before it, it passes. From that, it 
starts failing.

> Multithreaded access to Tar archive throws java.util.zip.ZipException: 
> Corrupt GZIP trailer
> -------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-666
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-666
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.26.0
>         Environment: Commons compress 1.26.0 to get a failure. Any tar tgz.
>            Reporter: Cosmin Carabet
>            Priority: Major
>
> Something in 
> [https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master]
>  seems to make iterating through the tar entries of multiple 
> TarArchiveInputStreams throw Corrupted TAR archive:
>  
> {code:java}
> @Test
> void bla() {
>     ExecutorService executorService = Executors.newFixedThreadPool(10);
>     List<CompletableFuture<Void>> tasks = IntStream.range(0, 200)
>             .mapToObj(_idx -> CompletableFuture.runAsync(
>                     () -> {
>                         try (InputStream inputStream = this.getClass()
>                                         .getResourceAsStream(
>                                                 "/<your favourite tar tgz>");
>                                 TarArchiveInputStream tarInputStream =
>                                         new TarArchiveInputStream(new 
> GZIPInputStream(inputStream))) {
>                             TarArchiveEntry tarEntry;
>                             while ((tarEntry = 
> tarInputStream.getNextTarEntry()) != null) {
>                                 System.out.println("Reading entry %s with 
> size %d"
>                                         .formatted(tarEntry.getName(), 
> tarEntry.getSize()));
>                             }
>                         } catch (Exception ex) {
>                             throw new RuntimeException(ex);
>                         }
>                     },
>                     executorService))
>             .toList();
>     Futures.getUnchecked(CompletableFuture.allOf(tasks.toArray(new 
> CompletableFuture<?>[0])));
> } {code}
> Although TarArchiveInputStream is marked as not thread safe, I am not reusing 
> objects here. Those are in fact separate objects, presumably all with their 
> own position tracking info.
>  
> The stacktrace here looks like:
> {code:java}
> Caused by: java.io.IOException: Corrupted TAR archive.
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431)
>     at
> Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 
> in 'dddddddddddd' len=12
>     at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516)
>     at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496)
>     at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478)
>     ... 7 more
>  {code}
> That code shows that occasionally the header is wrong (the tar entry name 
> contains gibberish bits) which makes me think that `getNextTarEntry()` can be 
> faulty.
>  
> Running that code with commons compress 1.25.0 works as expected. So it's 
> probably something added since November. Note that this is something related 
> to parallelism - using an executor service with a single thread doesn't 
> suffer from the same error. The tgz to decompress doesn't really matter - you 
> can use a manually created one worth a few KBs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to