Mikaël MECHOULAM created COMPRESS-679:
-----------------------------------------
Summary: Regression on parallel processing of 7zip files
Key: COMPRESS-679
URL: https://issues.apache.org/jira/browse/COMPRESS-679
Project: Commons Compress
Issue Type: Bug
Affects Versions: 1.26.1, 1.26.0
Reporter: Mikaël MECHOULAM
Attachments: file.7z
I've run into a bug which occurs when attempting to read a ZIP file in several
threads simultaneously. The following code illustrates the problem. The
file.7z is in attachment
{code:java}
import java.io.InputStream;
import java.nio.file.Paths;
import java.util.stream.IntStream;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
public class TestZip {
public static void main(final String[] args) {
final Runnable runnable = () -> {
try {
try (final SevenZFile sevenZFile =
SevenZFile.builder().setPath(Paths.get("file.7z")).get()) {
SevenZArchiveEntry sevenZArchiveEntry;
while ((sevenZArchiveEntry = sevenZFile.getNextEntry()) !=
null) {
if ("file4.txt".equals(sevenZArchiveEntry.getName())) {
// The entry must not be the first of the ZIP archive to reproduce
final InputStream inputStream =
sevenZFile.getInputStream(sevenZArchiveEntry);
// treatments...
break;
}
}
}
} catch (final Exception e) { // java.io.IOException: Checksum
verification failed
e.printStackTrace();
}
};
IntStream.range(0, 30).forEach(i -> new Thread(runnable).start());
}
}
{code}
Below is the output I receive on version 1.26:
{code:java}
java.io.IOException: Checksum verification failed
at
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.verify(ChecksumVerifyingInputStream.java:98)
at
org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:92)
at org.apache.commons.io.IOUtils.skip(IOUtils.java:2422)
at org.apache.commons.io.IOUtils.skip(IOUtils.java:2380)
at
org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:912)
at
org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:988)
at
com.infotel.arcsys.nativ.archiving.zip.TestZip.lambda$main$0(TestZip.java:21)
at java.base/java.lang.Thread.run(Thread.java:833)
{code}
The issue seems to arise from the transition from version 1.25 to 1.26 of
Apache Commons Compress. In the {{SevenZFile}} class of the library, the
private method {{getCurrentStream}} has migrated from
{{IOUtils.skip(InputStream, long)}} to a method with a same signature but in
Commons-IO package, which leads to a change in behavior. In version 1.26, it
uses a shared and unsynchronized buffer, theoretically intended only for
writing ({{{}SCRATCH_BYTE_BUFFER_WO{}}}). This causes checksum verification
issues within the library. The problem seems to be resolved by specifying the
{{Supplier}} of the buffer to use.
{code:java}
try (InputStream stream = deferredBlockStreams.remove(0)) {
org.apache.commons.io.IOUtils.skip(stream, Long.MAX_VALUE, () -> new
byte[org.apache.commons.io.IOUtils.DEFAULT_BUFFER_SIZE]);
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)