aajisaka commented on code in PR #8526:
URL: https://github.com/apache/hadoop/pull/8526#discussion_r3339321914
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/zstd/TestZStandardCompressorDecompressor.java:
##########
@@ -557,6 +557,64 @@ public void testDecompressReturnsWhenNothingToDecompress()
throws Exception {
assertEquals(0, result);
}
+ /**
+ * Verify that {@code setInput()} does not throw {@code
BufferOverflowException}
+ * after a previous {@code decompress()} call threw an exception.
+ *
+ * <p>When {@code decompress()} processes compressed data, it sets
+ * {@code compressedDirectBuf.limit(bytesInCompressedBuffer)} — a value that
+ * may be smaller than {@code directBufferSize}. If {@code
decompressDirectByteBufferStream}
+ * throws (e.g. on corrupted input), the limit is never restored. A
subsequent
+ * {@code reset()} also does not restore {@code compressedDirectBuf.limit}.
+ * So the next {@code setInput()} call will hit {@code
BufferOverflowException}
+ * because {@code setInputFromSavedData()} tries to {@code put()} more bytes
+ * than the current limit allows.</p>
+ *
+ * <p>This scenario occurs in practice when reading multiple zstd-compressed
+ * files from a directory: a corrupted file causes an exception
mid-decompress,
+ * the decompressor is returned to the pool and reset, but the limit stays
+ * small. The next file's {@code setInput()} then fails.</p>
+ */
+ @Test
+ public void testSetInputAfterDecompressThrowsOnCorruptedData() throws
Exception {
Review Comment:
This is not directly related to this issue, but is there any similar test
case in any other compression format? It might be good to run the same scenario
in all the compression format to ensure the safety.
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/compress/zstd/TestZStandardCompressorDecompressor.java:
##########
@@ -557,6 +557,64 @@ public void testDecompressReturnsWhenNothingToDecompress()
throws Exception {
assertEquals(0, result);
}
+ /**
+ * Verify that {@code setInput()} does not throw {@code
BufferOverflowException}
+ * after a previous {@code decompress()} call threw an exception.
+ *
+ * <p>When {@code decompress()} processes compressed data, it sets
+ * {@code compressedDirectBuf.limit(bytesInCompressedBuffer)} — a value that
+ * may be smaller than {@code directBufferSize}. If {@code
decompressDirectByteBufferStream}
+ * throws (e.g. on corrupted input), the limit is never restored. A
subsequent
+ * {@code reset()} also does not restore {@code compressedDirectBuf.limit}.
+ * So the next {@code setInput()} call will hit {@code
BufferOverflowException}
+ * because {@code setInputFromSavedData()} tries to {@code put()} more bytes
+ * than the current limit allows.</p>
+ *
+ * <p>This scenario occurs in practice when reading multiple zstd-compressed
+ * files from a directory: a corrupted file causes an exception
mid-decompress,
+ * the decompressor is returned to the pool and reset, but the limit stays
+ * small. The next file's {@code setInput()} then fails.</p>
+ */
+ @Test
+ public void testSetInputAfterDecompressThrowsOnCorruptedData() throws
Exception {
+ byte[] rawData = generate(400);
+ int bufSize = IO_FILE_BUFFER_SIZE_DEFAULT;
+
+ ByteArrayOutputStream baos = new ByteArrayOutputStream();
+ try (CompressionOutputStream cos = new CompressorStream(baos,
+ new ZStandardCompressor(), bufSize)) {
+ cos.write(rawData);
+ }
+ byte[] compressed = baos.toByteArray();
+
+ // Corrupt the compressed data by dropping the first 10 bytes.
+ byte[] corrupted = new byte[compressed.length - 10];
+ System.arraycopy(compressed, 10, corrupted, 0, corrupted.length);
+
+ ZStandardDecompressor decompressor = new ZStandardDecompressor(bufSize);
+ byte[] out = new byte[bufSize];
+
+ // Feed corrupted data — decompress() sets limit to corrupted.length, then
throws.
+ decompressor.setInput(corrupted, 0, corrupted.length);
+ try {
+ decompressor.decompress(out, 0, out.length);
+ } catch (Exception e) {
Review Comment:
`Exception` is a bit too broad, can we narrow down to a specific Exception?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]