ppkarwasz opened a new pull request, #710:
URL: https://github.com/apache/commons-compress/pull/710

   This change introduces a configurable limit on archive entry names.
   
   Although formats like **AR**, **CPIO**, and **TAR** permit arbitrarily long 
path names, real operating systems and file systems impose much stricter limits:
   
   * Individual path segments (file names) are typically limited to 255 bytes.
   * Full paths are usually capped at a few KiB (e.g. 1024 bytes on macOS via 
`MAX_PATH`).
   
   #### What’s new
   
   * Added a new common builder, `AbstractArchiveBuilder`, inserted in the 
hierarchy between archive stream builders and `AbstractStreamBuilder`.
   * Introduced a new configuration option: `setMaxEntryNameLength`.
   * Default value is `Short.MAX_VALUE`, which is higher than any realistic OS 
limit.
   * Enforced the limit across:
   
     * All `ArchiveInputStream` implementations
     * `SevenZFile`, `TarFile`, and `ZipFile`
   * Added a dedicated test suite to verify:
   
     * Entry names up to `Short.MAX_VALUE` are handled correctly.
     * Entries exceeding a lowered limit result in an exception.
   
   #### Exception usage
   
   This PR applies a coherent strategy on how exceptions are thrown when 
parsing archive input, which also required updates to unit tests:
   
   * **`EOFException`**
     Thrown when the stream ends *unexpectedly* (e.g., while reading a 
structure with a declared length).
     This signals a likely truncated archive. Users may choose to re-fetch or 
regenerate the archive.
   
   * **`MemoryLimitException`**
     Thrown when the implementation determines it cannot handle the archive 
under the current JVM heap settings.
   
     * For large **and** truncated archives, either `EOFException` or 
`MemoryLimitException` may occur depending on available memory.
     * The test suite currently adapts to `Runtime.totalMemory()`, but this 
value fluctuates between `-Xms` and `-Xmx`.
     * To improve reproducibility, I propose:
   
       * Switching to `Runtime.maxMemory()` checks (directly tied to `-Xmx`).
       * Adding a Surefire execution with constrained heap (e.g. `-Xmx256m`) to 
simulate low-memory environments and can be achieved on most developer machines.
   
   * **`ArchiveException`**
     Thrown when archive data is *structurally invalid*.
     Examples:
   
     * A PAX header declares key/value pairs longer than the header itself.
     * An entry path exceeds `maxEntryNameLength`.
     
     These errors cannot be resolved by simply retrying with the same archive 
and options. However, it may be worth distinguishing cases where the archive 
itself is invalid from cases where the failure results from an incompatibility 
between the archive and the chosen options. The latter is conceptually similar 
to `MemoryLimitException`, which indicates that an archive is not compatible 
with the JVM’s heap settings.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to