[ 
https://issues.apache.org/jira/browse/COMPRESS-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikaël MECHOULAM updated COMPRESS-679:
--------------------------------------
    Description: 
I've run into a bug which occurs when attempting to read a 7zip file in several 
threads simultaneously.  The following code illustrates the problem. The 
file.7z is in attachment

 
{code:java}
import java.io.InputStream;
import java.nio.file.Paths;
import java.util.stream.IntStream;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
public class TestZip {
    public static void main(final String[] args) {
        final Runnable runnable = () -> {
            try {
                try (final SevenZFile sevenZFile = 
SevenZFile.builder().setPath(Paths.get("file.7z")).get()) {
                    SevenZArchiveEntry sevenZArchiveEntry;
                    while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) != 
null) {
                        if ("file4.txt".equals(sevenZArchiveEntry.getName())) { 
// The entry must not be the first of the ZIP archive to reproduce
                            final InputStream inputStream = 
sevenZFile.getInputStream(sevenZArchiveEntry);
                            // treatments...
                            break;
                        }
                    }
                }
            } catch (final Exception e) { // java.io.IOException: Checksum 
verification failed
                e.printStackTrace();
            }
        };
        IntStream.range(0, 30).forEach(i -> new Thread(runnable).start());
    }
}
{code}
Below is the output I receive on version 1.26: 

 
{code:java}
java.io.IOException: Checksum verification failed
  at 
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98)
  at 
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92)
  at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422)
  at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380)
  at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912)
  at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988)
  at 
com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21)
  at java.base/java.lang.Thread.run(Thread.java:833)
 
{code}
The issue seems to arise from the transition from version 1.25 to 1.26 of 
Apache Commons Compress. In the {{SevenZFile}} class of the library, the 
private method {{getCurrentStream}} has migrated from 
{{IOUtils.skip(InputStream, long)}} to a method with a same signature but in 
Commons-IO package, which leads to a change in behavior. In version 1.26, it 
uses a shared and unsynchronized buffer, theoretically intended only for 
writing ({{{}SCRATCH_BYTE_BUFFER_WO{}}}). This causes checksum verification 
issues within the library. The problem seems to be resolved by specifying the 
{{Supplier}} of the buffer to use.
{code:java}
try (InputStream stream = deferredBlockStreams.remove(0)) {
    org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new 
byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]);
} {code}

  was:
I've run into a bug which occurs when attempting to read a ZIP file in several 
threads simultaneously.  The following code illustrates the problem. The 
file.7z is in attachment

 
{code:java}
import java.io.InputStream;
import java.nio.file.Paths;
import java.util.stream.IntStream;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
public class TestZip {
    public static void main(final String[] args) {
        final Runnable runnable = () -> {
            try {
                try (final SevenZFile sevenZFile = 
SevenZFile.builder().setPath(Paths.get("file.7z")).get()) {
                    SevenZArchiveEntry sevenZArchiveEntry;
                    while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) != 
null) {
                        if ("file4.txt".equals(sevenZArchiveEntry.getName())) { 
// The entry must not be the first of the ZIP archive to reproduce
                            final InputStream inputStream = 
sevenZFile.getInputStream(sevenZArchiveEntry);
                            // treatments...
                            break;
                        }
                    }
                }
            } catch (final Exception e) { // java.io.IOException: Checksum 
verification failed
                e.printStackTrace();
            }
        };
        IntStream.range(0, 30).forEach(i -> new Thread(runnable).start());
    }
}
{code}
Below is the output I receive on version 1.26: 

 
{code:java}
java.io.IOException: Checksum verification failed
  at 
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98)
  at 
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92)
  at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422)
  at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380)
  at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912)
  at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988)
  at 
com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21)
  at java.base/java.lang.Thread.run(Thread.java:833)
 
{code}
The issue seems to arise from the transition from version 1.25 to 1.26 of 
Apache Commons Compress. In the {{SevenZFile}} class of the library, the 
private method {{getCurrentStream}} has migrated from 
{{IOUtils.skip(InputStream, long)}} to a method with a same signature but in 
Commons-IO package, which leads to a change in behavior. In version 1.26, it 
uses a shared and unsynchronized buffer, theoretically intended only for 
writing ({{{}SCRATCH_BYTE_BUFFER_WO{}}}). This causes checksum verification 
issues within the library. The problem seems to be resolved by specifying the 
{{Supplier}} of the buffer to use.
{code:java}
try (InputStream stream = deferredBlockStreams.remove(0)) {
    org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new 
byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]);
} {code}


> Regression on parallel processing of 7zip files
> -----------------------------------------------
>
>                 Key: COMPRESS-679
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-679
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.26.0, 1.26.1
>            Reporter: Mikaël MECHOULAM
>            Priority: Critical
>         Attachments: file.7z
>
>
> I've run into a bug which occurs when attempting to read a 7zip file in 
> several threads simultaneously.  The following code illustrates the problem. 
> The file.7z is in attachment
>  
> {code:java}
> import java.io.InputStream;
> import java.nio.file.Paths;
> import java.util.stream.IntStream;
> import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
> import org.apache.commons.compress.archivers.sevenz.SevenZFile;
> public class TestZip {
>     public static void main(final String[] args) {
>         final Runnable runnable = () -> {
>             try {
>                 try (final SevenZFile sevenZFile = 
> SevenZFile.builder().setPath(Paths.get("file.7z")).get()) {
>                     SevenZArchiveEntry sevenZArchiveEntry;
>                     while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) 
> != null) {
>                         if ("file4.txt".equals(sevenZArchiveEntry.getName())) 
> { // The entry must not be the first of the ZIP archive to reproduce
>                             final InputStream inputStream = 
> sevenZFile.getInputStream(sevenZArchiveEntry);
>                             // treatments...
>                             break;
>                         }
>                     }
>                 }
>             } catch (final Exception e) { // java.io.IOException: Checksum 
> verification failed
>                 e.printStackTrace();
>             }
>         };
>         IntStream.range(0, 30).forEach(i -> new Thread(runnable).start());
>     }
> }
> {code}
> Below is the output I receive on version 1.26: 
>  
> {code:java}
> java.io.IOException: Checksum verification failed
>   at 
> org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98)
>   at 
> org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92)
>   at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422)
>   at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988)
>   at 
> com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21)
>   at java.base/java.lang.Thread.run(Thread.java:833)
>  
> {code}
> The issue seems to arise from the transition from version 1.25 to 1.26 of 
> Apache Commons Compress. In the {{SevenZFile}} class of the library, the 
> private method {{getCurrentStream}} has migrated from 
> {{IOUtils.skip(InputStream, long)}} to a method with a same signature but in 
> Commons-IO package, which leads to a change in behavior. In version 1.26, it 
> uses a shared and unsynchronized buffer, theoretically intended only for 
> writing ({{{}SCRATCH_BYTE_BUFFER_WO{}}}). This causes checksum verification 
> issues within the library. The problem seems to be resolved by specifying the 
> {{Supplier}} of the buffer to use.
> {code:java}
> try (InputStream stream = deferredBlockStreams.remove(0)) {
>     org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new 
> byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to