Re: [PR] ARJ: correct byte accounting and truncation errors [commons-compress]

via GitHub Fri, 17 Oct 2025 23:25:54 -0700


garydgregory commented on code in PR #723:
URL: https://github.com/apache/commons-compress/pull/723#discussion_r2410298499



##########
src/main/java/org/apache/commons/compress/archivers/arj/ArjArchiveInputStream.java:
##########
@@ -128,6 +125,8 @@ private ArjArchiveInputStream(final InputStream 
inputStream, final Builder build
             if ((mainHeader.arjFlags & MainHeader.Flags.VOLUME) != 0) {
                 throw new ArchiveException("Multi-volume ARJ files are 
unsupported");
             }
+        } catch (final ArchiveException e) {
+            throw e;
         } catch (final IOException e) {
             throw new ArchiveException(e.getMessage(), (Throwable) e);

Review Comment:
   Since `ArchiveException` is an `IOException`, should we let this propagate 
as is, reflecting a real, low-level error? Then, there would be no need to 
rethrow the `ArchiveException` above.



##########
src/main/java/org/apache/commons/compress/archivers/arj/ArjArchiveInputStream.java:
##########
@@ -225,164 +235,174 @@ public int read(final byte[] b, final int off, final 
int len) throws IOException
         return currentInputStream.read(b, off, len);
     }
 
-    private int read16(final DataInputStream dataIn) throws IOException {
-        final int value = dataIn.readUnsignedShort();
-        count(2);
-        return Integer.reverseBytes(value) >>> 16;
+    private static int readUnsignedByte(InputStream in) throws IOException {
+        final int value = in.read();
+        if (value == -1) {
+            throw new EOFException();
+        }
+        return value & 0xff;
     }
 
-    private int read32(final DataInputStream dataIn) throws IOException {
-        final int value = dataIn.readInt();
-        count(4);
-        return Integer.reverseBytes(value);
+    private int readSwappedUnsignedShort() throws IOException {
+        final int value = EndianUtils.readSwappedUnsignedShort(in);
+        count(2);
+        return value;
     }
 
-    private int read8(final DataInputStream dataIn) throws IOException {
-        final int value = dataIn.readUnsignedByte();
+    private int readUnsignedByte() throws IOException {
+        final int value = readUnsignedByte(in);
         count(1);
-        return value;
+        return value & 0xff;
     }
 
-    private void readExtraData(final int firstHeaderSize, final 
DataInputStream firstHeader, final LocalFileHeader localFileHeader) throws 
IOException {
+    private static void readExtraData(final int firstHeaderSize, final 
InputStream firstHeader, final LocalFileHeader localFileHeader) throws 
IOException {
         if (firstHeaderSize >= 33) {
-            localFileHeader.extendedFilePosition = read32(firstHeader);
+            localFileHeader.extendedFilePosition = 
readSwappedInteger(firstHeader);
             if (firstHeaderSize >= 45) {
-                localFileHeader.dateTimeAccessed = read32(firstHeader);
-                localFileHeader.dateTimeCreated = read32(firstHeader);
-                localFileHeader.originalSizeEvenForVolumes = 
read32(firstHeader);
-                pushedBackBytes(12);
+                localFileHeader.dateTimeAccessed = 
readSwappedInteger(firstHeader);
+                localFileHeader.dateTimeCreated = 
readSwappedInteger(firstHeader);
+                localFileHeader.originalSizeEvenForVolumes = 
readSwappedInteger(firstHeader);
             }
-            pushedBackBytes(4);
         }
     }
 
+    /**
+     * Scans for the next valid ARJ header.
+     *
+     * @return The header bytes, or {@code null} if end of archive.
+     * @throws EOFException If the end of the stream is reached before a valid 
header is found.
+     * @throws IOException If an I/O error occurs.
+     */
     private byte[] readHeader() throws IOException {
-        boolean found = false;
-        byte[] basicHeaderBytes = null;
-        do {
+        byte[] basicHeaderBytes;
+        // TODO: Explain why we are scanning for a valid ARJ header
+        //       and don't throw, when an invalid/corrupted header is found,
+        //       which might indicate a corrupted archive.
+        while (true) {

Review Comment:
   Hi @ppkarwasz 
   
   > DataInputStream (big-endian) forced ad-hoc helpers for ARJ’s little-endian 
fields.
   
   I don't understand what the above means.
   
   ```java
           // TODO: Explain why we are scanning for a valid ARJ header
           //       and don't throw, when an invalid/corrupted header is found,
           //       which might indicate a corrupted archive.
   ```
   Yes, please do explain it, or perhaps this was meant to be a draft PR? 😉 
   
   I'm concerned that this is switching behavior around so much that we should 
instead introduce a `STRICT`/`LENIENT` `enum` option in the root builder that 
different classes can implement as they each best see fit. This would allow us 
to keep existing behavior or at least have users opt-in to whether they want 
strict or lenient behavior, no matter what the default is, where we can then 
decide when what defaults should be changed. Either way, if we do change the 
default behavior to strict and break some users' oddball files, they can ask 
for lenient processing.
   
   WDYT?
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] ARJ: correct byte accounting and truncation errors [commons-compress]

Reply via email to