[ https://issues.apache.org/jira/browse/COMPRESS-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822644#comment-17822644 ]
Cosmin Carabet edited comment on COMPRESS-666 at 3/1/24 5:39 PM: ----------------------------------------------------------------- I've done a binary search on the master branch for the first commit where my test fails. Looks like it's this one [https://github.com/apache/commons-compress/commit/856a540d3eeb463527c509853f4d205f4adc272f] . If my test is run on the commit just before it, it passes. From that, it starts failing. I've just played a bit more on that commit linked above where things starting failing. If I revert changes to the following 2 files, my test passes: {code:java} git checkout 0c9f16e43 -- src/main/java/org/apache/commons/compress/archivers/tar/TarArchiveInputStream.java git checkout 0c9f16e43 -- src/main/java/org/apache/commons/compress/utils/IOUtils.java {code} was (Author: JIRAUSER304399): I've done a binary search on the master branch for the first commit where my test fails. Looks like it's this one [https://github.com/apache/commons-compress/commit/856a540d3eeb463527c509853f4d205f4adc272f] . If my test is run on the commit just before it, it passes. From that, it starts failing. > Multithreaded access to Tar archive throws java.util.zip.ZipException: > Corrupt GZIP trailer > ------------------------------------------------------------------------------------------- > > Key: COMPRESS-666 > URL: https://issues.apache.org/jira/browse/COMPRESS-666 > Project: Commons Compress > Issue Type: Bug > Affects Versions: 1.26.0 > Environment: Commons compress 1.26.0 to get a failure. Any tar tgz. > Reporter: Cosmin Carabet > Priority: Major > > Something in > [https://github.com/apache/commons-compress/compare/rel/commons-compress-1.25.0...master] > seems to make iterating through the tar entries of multiple > TarArchiveInputStreams throw Corrupted TAR archive: > > {code:java} > @Test > void bla() { > ExecutorService executorService = Executors.newFixedThreadPool(10); > List<CompletableFuture<Void>> tasks = IntStream.range(0, 200) > .mapToObj(_idx -> CompletableFuture.runAsync( > () -> { > try (InputStream inputStream = this.getClass() > .getResourceAsStream( > "/<your favourite tar tgz>"); > TarArchiveInputStream tarInputStream = > new TarArchiveInputStream(new > GZIPInputStream(inputStream))) { > TarArchiveEntry tarEntry; > while ((tarEntry = > tarInputStream.getNextTarEntry()) != null) { > System.out.println("Reading entry %s with > size %d" > .formatted(tarEntry.getName(), > tarEntry.getSize())); > } > } catch (Exception ex) { > throw new RuntimeException(ex); > } > }, > executorService)) > .toList(); > Futures.getUnchecked(CompletableFuture.allOf(tasks.toArray(new > CompletableFuture<?>[0]))); > } {code} > Although TarArchiveInputStream is marked as not thread safe, I am not reusing > objects here. Those are in fact separate objects, presumably all with their > own position tracking info. > > The stacktrace here looks like: > {code:java} > Caused by: java.io.IOException: Corrupted TAR archive. > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1480) > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.<init>(TarArchiveEntry.java:534) > at > org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:431) > at > Caused by: java.lang.IllegalArgumentException: Invalid byte 100 at offset 0 > in 'dddddddddddd' len=12 > at > org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:516) > at > org.apache.commons.compress.archivers.tar.TarUtils.parseOctalOrBinary(TarUtils.java:540) > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeaderUnwrapped(TarArchiveEntry.java:1496) > at > org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:1478) > ... 7 more > {code} > That code shows that occasionally the header is wrong (the tar entry name > contains gibberish bits) which makes me think that `getNextTarEntry()` can be > faulty. > > Running that code with commons compress 1.25.0 works as expected. So it's > probably something added since November. Note that this is something related > to parallelism - using an executor service with a single thread doesn't > suffer from the same error. The tgz to decompress doesn't really matter - you > can use a manually created one worth a few KBs. -- This message was sent by Atlassian Jira (v8.20.10#820010)