Re: [PR] ARJ: correct byte accounting and truncation errors [commons-compress]

via GitHub Sun, 12 Oct 2025 13:37:18 -0700


ppkarwasz commented on code in PR #723:
URL: https://github.com/apache/commons-compress/pull/723#discussion_r2424886682



##########
src/main/java/org/apache/commons/compress/archivers/arj/ArjArchiveInputStream.java:
##########
@@ -225,164 +235,174 @@ public int read(final byte[] b, final int off, final 
int len) throws IOException
         return currentInputStream.read(b, off, len);
     }
 
-    private int read16(final DataInputStream dataIn) throws IOException {
-        final int value = dataIn.readUnsignedShort();
-        count(2);
-        return Integer.reverseBytes(value) >>> 16;
+    private static int readUnsignedByte(InputStream in) throws IOException {
+        final int value = in.read();
+        if (value == -1) {
+            throw new EOFException();
+        }
+        return value & 0xff;
     }
 
-    private int read32(final DataInputStream dataIn) throws IOException {
-        final int value = dataIn.readInt();
-        count(4);
-        return Integer.reverseBytes(value);
+    private int readSwappedUnsignedShort() throws IOException {
+        final int value = EndianUtils.readSwappedUnsignedShort(in);
+        count(2);
+        return value;
     }
 
-    private int read8(final DataInputStream dataIn) throws IOException {
-        final int value = dataIn.readUnsignedByte();
+    private int readUnsignedByte() throws IOException {
+        final int value = readUnsignedByte(in);
         count(1);
-        return value;
+        return value & 0xff;
     }
 
-    private void readExtraData(final int firstHeaderSize, final 
DataInputStream firstHeader, final LocalFileHeader localFileHeader) throws 
IOException {
+    private static void readExtraData(final int firstHeaderSize, final 
InputStream firstHeader, final LocalFileHeader localFileHeader) throws 
IOException {
         if (firstHeaderSize >= 33) {
-            localFileHeader.extendedFilePosition = read32(firstHeader);
+            localFileHeader.extendedFilePosition = 
readSwappedInteger(firstHeader);
             if (firstHeaderSize >= 45) {
-                localFileHeader.dateTimeAccessed = read32(firstHeader);
-                localFileHeader.dateTimeCreated = read32(firstHeader);
-                localFileHeader.originalSizeEvenForVolumes = 
read32(firstHeader);
-                pushedBackBytes(12);
+                localFileHeader.dateTimeAccessed = 
readSwappedInteger(firstHeader);
+                localFileHeader.dateTimeCreated = 
readSwappedInteger(firstHeader);
+                localFileHeader.originalSizeEvenForVolumes = 
readSwappedInteger(firstHeader);
             }
-            pushedBackBytes(4);
         }
     }
 
+    /**
+     * Scans for the next valid ARJ header.
+     *
+     * @return The header bytes, or {@code null} if end of archive.
+     * @throws EOFException If the end of the stream is reached before a valid 
header is found.
+     * @throws IOException If an I/O error occurs.
+     */
     private byte[] readHeader() throws IOException {
-        boolean found = false;
-        byte[] basicHeaderBytes = null;
-        do {
+        byte[] basicHeaderBytes;
+        // TODO: Explain why we are scanning for a valid ARJ header
+        //       and don't throw, when an invalid/corrupted header is found,
+        //       which might indicate a corrupted archive.
+        while (true) {

Review Comment:
   > `DataInputStream` (big-endian) forced ad-hoc helpers for ARJ’s 
little-endian fields.
   
   ARJ writes numbers in little-endian format, so I found that using 
`DataInputStream` is not very helpful, since we still need to inverse the order 
of the bytes after. Instead I used `EndianUtils` on a “normal” input stream, 
which in my opinion simplifies the class.
   
   > Yes, please do explain it, or perhaps this was meant to be a draft PR? 😉
   
   The `ArjArchiveInputStream` is the only input stream in Commons Compress 
that does not throw, when an invalid entry is found (e.g. mismatching CRC). The 
invalid entry is simply skipped.
   
   Since we don't have any tests that use this feature, my guess is that it was 
meant to handle self-extracting archives, which contain an executable stub and 
a regular ARJ archive. If an “invalid” entry is found in the executable stub, 
it should be skipped, but an “invalid” entry in the data part of the executable 
should throw.
   
   This PR does **not** modify the existing behavior. I opened #728 for that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] ARJ: correct byte accounting and truncation errors [commons-compress]

Reply via email to