[jira] [Created] (COMPRESS-724) TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the sparse header is already aligned

Ruiqi Dong (Jira) Thu, 21 May 2026 01:46:17 -0700

Ruiqi Dong created COMPRESS-724:
-----------------------------------

             Summary: TarUtils.parsePAX1XSparseHeaders() skips one extra full 
record when the sparse header is already aligned
                 Key: COMPRESS-724
                 URL: https://issues.apache.org/jira/browse/COMPRESS-724
             Project: Commons Compress
          Issue Type: Bug
          Components: Archivers
    Affects Versions: 1.28.0
            Reporter: Ruiqi Dong



*Summary*
parsePAX1XSparseHeaders() computes the remaining padding as recordSize - 
bytesRead % recordSize. When bytesRead % recordSize == 0, the correct skip 
should be 0, but the current code skips an entire extra record. This helper is 
used on the normal GNU sparse 1.x read path, so an affected archive can advance 
the stream past real file data.
 
*Affected code*
File: src/main/java/org/apache/commons/compress/archivers/tar/TarUtils.java
{code:java}
static List<TarArchiveStructSparse> parsePAX1XSparseHeaders(
        final InputStream inputStream, final int recordSize) throws IOException 
{
    ...
    while (sparseHeadersCount-- > 0) {
        ...
        bytesRead += readResult[1];
        sparseHeaders.add(new TarArchiveStructSparse(sparseOffset, 
sparseNumbytes));
    }
    final long bytesToSkip = recordSize - bytesRead % recordSize;
    IOUtils.skip(inputStream, bytesToSkip);
    return sparseHeaders;
} {code}
**
This code path is used when reading GNU sparse 1.x entries from both 
TarArchiveInputStream and TarFile:
{code:java}
// TarArchiveInputStream
if (currEntry.isPaxGNU1XSparse()) {
    
currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentInputStream, 
getRecordSize()));
}
buildSparseInputStreams(); {code}
{code:java}
// TarFile
if (currEntry.isPaxGNU1XSparse()) {
    final long position = archive.position();
    currEntry.setSparseHeaders(TarUtils.parsePAX1XSparseHeaders(currentStream, 
recordSize));
    final long sparseHeadersSize = archive.position() - position;
    currEntry.setSize(currEntry.getSize() - sparseHeadersSize);
    currEntry.setDataOffset(currEntry.getDataOffset() + sparseHeadersSize);
} {code}
*Reproducer*
Add the following test to 
src/test/java/org/apache/commons/compress/archivers/tar/TarUtilsTest.java:
{code:java}
@Test
void testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord() throws Exception 
{
    final byte[] sparseHeader = pax1xSparseHeaderAlignedToSingleRecord();
    final byte[] fileData = new byte[513];
    Arrays.fill(fileData, 0, 512, (byte) 'A');
    fileData[512] = 'B';
    final byte[] input = ArrayUtils.addAll(sparseHeader, fileData);
    try (ByteArrayInputStream in = new ByteArrayInputStream(input)) {
        final List<TarArchiveStructSparse> sparseHeaders =
                TarUtils.parsePAX1XSparseHeaders(in, 512);
        assertEquals(100, sparseHeaders.size());
        assertEquals('A', in.read());
    }
} {code}
Run:
{code:java}
mvn -q -Dtest=org.apache.commons.compress.archivers.tar.TarUtilsTest test {code}
Observed behavior:
{code:java}
TarUtilsTest.testParsePax1xSparseHeadersDoesNotSkipAlignedDataRecord:251
expected: <65> but was: <66> {code}
Expected behavior:
After parsing an already aligned sparse header block, the next byte should be 
the first byte of file data.
The current formula converts an exact alignment case into a full-record skip. 
In the GNU sparse 1.x read path, that means the reader moves one record too far 
before building the sparse-data streams, so real payload bytes are skipped and 
the extracted/read file content becomes misaligned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (COMPRESS-724) TarUtils.parsePAX1XSparseHeaders() skips one extra full record when the sparse header is already aligned

Reply via email to