[
https://issues.apache.org/jira/browse/COMPRESS-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruiqi Dong updated COMPRESS-724:
--------------------------------
Priority: Major (was: Critical)
> TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the
> sparse header is already aligned
> --------------------------------------------------------------------------------------------------------
>
> Key: COMPRESS-724
> URL: https://issues.apache.org/jira/browse/COMPRESS-724
> Project: Commons Compress
> Issue Type: Bug
> Components: Archivers
> Affects Versions: 1.28.0
> Reporter: Ruiqi Dong
> Priority: Major
>
> *Summary*
> parsePAX1XSparseHeaders() computes the remaining padding as recordSize -
> bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip
> should be 0, but the current code skips an entire extra record. This helper
> is used on the normal GNU sparse 1.x read path, so an affected archive can
> advance the stream past real file data.
>
> *Affected code*
> File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
> {code:java}
> static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
> final InputStream inputStream, final int recordSize) throws
> IOException {
> ...
> while (sparseHeadersCount-- > 0) {
> ...
> bytesRead += readResult[1];
> sparseHeaders.add(new TarArchiveStructSparse(sparseOffset,
> sparseNumbytes));
> }
> final long bytesToSkip = recordSize - bytesRead % recordSize;
> IOUtils.skip(inputStream, bytesToSkip);
> return sparseHeaders;
> } {code}
> This code path is used when reading GNU sparse 1.x entries from both
> TarArchiveInputStream and TarFile:
> {code:java}
> // TarArchiveInputStream
> if (currEntry.isPaxGNU1XSparse()) {
>
> currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream,
> getRecordSize()));
> }
> buildSparseInputStreams(); {code}
> {code:java}
> // TarFile
> if (currEntry.isPaxGNU1XSparse()) {
> final long position = archive.position();
>
> currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream,
> recordSize));
> final long sparseHeadersSize = archive.position() - position;
> currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
> currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
> } {code}
>
> *Reproducer*
> Add the following test to
> src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
> {code:java}
> @Test
> void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws
> Exception {
> final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
> final byte[] fileData = new byte[513];
> Arrays.fill(fileData, 0, 512, (byte) 'A');
> fileData[512] = 'B';
> final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
> try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
> final List<TarArchiveStructSparse> sparseHeaders =
> TarUtils.parsePAX1XSparseHeaders(in, 512);
> assertEquals(100, sparseHeaders.size());
> assertEquals('A', in.read());
> }
> } {code}
> Run:
> {code:java}
> mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test
> {code}
> Observed behavior:
> {code:java}
> TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
> expected: <65> but was: <66> {code}
> Expected behavior:
> After parsing an already aligned sparse header block, the next byte should be
> the first byte of file data.
> The current formula converts an exact alignment case into a full-record skip.
> In the GNU sparse 1.x read path, that means the reader moves one record too
> far before building the sparse-data streams, so real payload bytes are
> skipped, and the extracted/read file content becomes misaligned.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)