ppkarwasz commented on code in PR #710:
URL: https://github.com/apache/commons-compress/pull/710#discussion_r2380163633


##########
src/main/java/org/apache/commons/compress/archivers/tar/TarFile.java:
##########
@@ -65,7 +66,7 @@ private final class BoundedTarEntryInputStream extends 
BoundedArchiveInputStream
         BoundedTarEntryInputStream(final TarArchiveEntry entry, final 
SeekableByteChannel channel) throws IOException {
             super(entry.getDataOffset(), entry.getRealSize());
             if (channel.size() - entry.getSize() < entry.getDataOffset()) {
-                throw new ArchiveException("Entry size exceeds archive size");
+                throw new EOFException("Truncated TAR archive: entry size 
exceeds archive size");

Review Comment:
   I switched the exception to `EOFException` to make `TarFile` and 
`TarArchiveInputStream` behave consistently, since they share the same test 
suite in `MaxNameEntryLengthTest`.
   
   However, you are right: this is not a “real” EOF. With a genuine EOF you can 
usually still read the non-truncated part of the entry. Here the check is 
performed eagerly in the constructor and prevents reading anything at all.
   
   This raises the question: should we remove this upfront check and instead 
let truncation errors surface naturally while reading downstream? I generally 
favor eager validation, but only when we know the stream will be fully consumed 
before handing control back to the caller. In this case the check applies 
unconditionally: it triggers both when we consume entries internally (e.g. PAX 
headers) and when users obtain the stream themselves.
   
   One more caveat: `SeekableByteChannel#size()` is not stable by contract. It 
may change if the underlying file is modified, so even if this check fails here 
it might have succeeded if performed at a later stage.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to