This is an automated email from the ASF dual-hosted git repository.
ggregory pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/commons-compress.git
The following commit(s) were added to refs/heads/master by this push:
new 97919fc60 feat: enforce 20-bit Huffman code-length limit consistently
(#699)
97919fc60 is described below
commit 97919fc6010556e751e7a3aa3584f4cde058cf19
Author: Piotr P. Karwasz <[email protected]>
AuthorDate: Wed Aug 27 23:11:58 2025 +0200
feat: enforce 20-bit Huffman code-length limit consistently (#699)
* feat: enforce 20-bit Huffman code-length limit consistently
This PR improves `BZip2CompressorInputStream` by enforcing the mandated
Huffman code-length bound and right-sizing related tables. It aligns behavior
with the reference C implementation’s intent while avoiding its historical
over-allocation.
## Why
The reference C code deliberately **over-sizes** its Huffman structures as
a defensive programming technique: e.g., it defines `BZ_MAX_CODE_LEN = 23` and
sizes length-indexed tables by `BZ_MAX_ALPHA_SIZE` rather than the code-length
bound to provide safety margins. However, the bzip2 bitstream only permits code
lengths ≤ 20 (and the 1.0.6 implementation effectively tightens this to 17), so
carrying those defensive constants forward in our code obscures the true limit
and keeps arrays [...]
## What’s changed
* **Set the true limit**: `MAX_CODE_LEN` now equals **20** (the format
maximum).
* **Validate inputs**: code lengths are checked explicitly; any value
**outside `[1, 20]`** triggers a clear exception early in decoding.
* **Right-size arrays**: Huffman tables and auxiliary arrays are sized to
the minimum required for the 20-bit limit, reducing footprint and clarifying
invariants.
* **Tests**: add a unit test that exercises the maximum alphabet size
across boundary code lengths.
## Maintenance & performance
* Clearer invariants (the limit you see is the limit you enforce).
* Slightly lower memory usage due to smaller tables.
* Easier reasoning about bounds and indexing during review and future
changes.
## References
*
[bzip2/huffman.c](https://github.com/libarchive/bzip2/blob/master/huffman.c)
*
[bzip2/bzlib_private.h](https://github.com/libarchive/bzip2/blob/master/bzlib_private.h)
(Reference implementation uses over-sized constants for safety but
enforces an effective 20-bit maximum.)
* feat: add coverage for `recvDecodingTables` code length validation
Ensure that `recvDecodingTables` rejects Huffman code lengths outside the
valid range [1, 20]. This prevents invalid input from causing array overflows
in `Data` during `createHuffmanDecodingTables`.
The new test verifies both rejection of out-of-range values and correct
acceptance of boundary values.
* fix: simplify computation of `minLen` and `maxLen`
* fix: comment the construction of Huffman tables
* fix: validate indexes against actual array length
Indexes were previously validated against theoretical maximum limits
instead of the actual array size. This made the code harder to reason about and
was likely inherited from C conventions, where array length is not directly
available. Using the real length simplifies validation and avoids potential
errors.
* fix: failing tests
* fix: limit clean-up to used range
Co-authored-by: Copilot <[email protected]>
* fix: initialize `inUseCount` to minimal value
Initializes `inUseCount` to its minimal value to improve the clarity of the
code.
Co-authored-by: Copilot <[email protected]>
* fix: remove `assertAll`
* fix: improve changelog entries
---------
Co-authored-by: Copilot <[email protected]>
---
src/changes/changes.xml | 4 +-
.../bzip2/BZip2CompressorInputStream.java | 138 ++++++++++++---------
.../compress/compressors/bzip2/BZip2Constants.java | 14 ++-
.../bzip2/BZip2CompressorInputStreamTest.java | 107 ++++++++++++++++
4 files changed, 202 insertions(+), 61 deletions(-)
diff --git a/src/changes/changes.xml b/src/changes/changes.xml
index dca1fd89b..3039736e1 100644
--- a/src/changes/changes.xml
+++ b/src/changes/changes.xml
@@ -58,8 +58,8 @@ The <action> type attribute can be add,update,fix,remove.
<action type="fix" dev="ggregory" due-to="Gary Gregory">Don't loose
precision while reading folders from a SevenZFile.</action>
<action type="fix" dev="ggregory" due-to="Roel van Dijk, Gary
Gregory">Improve some exception messages in TarUtils and
TarArchiveEntry.</action>
<!-- FIX bzip2 -->
- <action type="fix" dev="ggregory" due-to="Tyler Nighswander, Gary
Gregory">org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream
constructors now throws CompressorException instead of
ArrayIndexOutOfBoundsException.</action>
- <action type="fix" dev="ggregory" due-to="Gary
Gregory">org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream
throws CompressorException (an IOException subclass) instead of IOException
when it encounters formatting problems.</action>
+ <action type="fix" dev="ggregory" due-to="Tyler Nighswander, Gary
Gregory">BZip2 input streams now throw CompressorException (a subclass of
IOException) for invalid or corrupted data, providing more specific error
reporting.</action>
+ <action type="fix" dev="pkarwasz" due-to="Tyler Nighswander, Piotr P.
Karwasz">BZip2 input streams treat Huffman codes longer than 20 bits as
corrupted data, matching the behavior of the reference implementation.</action>
<!-- FIX dump -->
<action type="fix" dev="pkarwasz" due-to="Tyler Nighswander">Align DUMP
archive block size with Linux `dump` utility.</action>
<action type="fix" dev="ggregory" due-to="Tyler Nighswander, Gary
Gregory">org.apache.commons.compress.archivers.dump.DumpArchiveInputStream.getNextEntry()
throws an ArchiveException instead of ArrayIndexOutOfBoundsException.</action>
diff --git
a/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.java
b/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.java
index ea21b3f80..9ea56c588 100644
---
a/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.java
+++
b/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStream.java
@@ -42,10 +42,13 @@
*/
public class BZip2CompressorInputStream extends CompressorInputStream
implements BZip2Constants, InputStreamStatistics {
- private static final class Data {
+ // package private for testing
+ static final class Data {
// (with blockSize 900k)
final boolean[] inUse = new boolean[256]; // 256 byte
+ // Always equal to the number of true values in inUse[] plus 2.
+ private int inUseCount = 2;
final byte[] seqToUnseq = new byte[256]; // 256 byte
final byte[] selector = new byte[MAX_SELECTORS]; // 18002 byte
@@ -56,8 +59,10 @@ private static final class Data {
*/
final int[] unzftab = new int[256]; // 1024 byte
- final int[][] limit = new int[N_GROUPS][MAX_ALPHA_SIZE]; // 6192 byte
- final int[][] base = new int[N_GROUPS][MAX_ALPHA_SIZE]; // 6192 byte
+ // Needs indexes from 0 to MAX_CODE_LEN inclusive.
+ final int[][] limit = new int[N_GROUPS][MAX_CODE_LEN + 1];
+ // Needs indexes from 0 to MAX_CODE_LEN + 1 inclusive.
+ final int[][] base = new int[N_GROUPS][MAX_CODE_LEN + 2];
final int[][] perm = new int[N_GROUPS][MAX_ALPHA_SIZE]; // 6192 byte
final int[] minLens = new int[N_GROUPS]; // 24 byte
@@ -155,9 +160,12 @@ private static void checkBounds(final int checkVal, final
int limitExclusive, fi
/**
* Called by createHuffmanDecodingTables() exclusively.
+ *
+ * @param minLen minimum code length in the range [1, {@value
MAX_CODE_LEN}] guaranteed by the caller.
+ * @param maxLen maximum code length in the range [1, {@value
MAX_CODE_LEN}] guaranteed by the caller.
*/
private static void hbCreateDecodeTables(final int[] limit, final int[]
base, final int[] perm, final char[] length, final int minLen, final int maxLen,
- final int alphaSize) throws IOException {
+ final int alphaSize) {
for (int i = minLen, pp = 0; i <= maxLen; i++) {
for (int j = 0; j < alphaSize; j++) {
if (length[j] == i) {
@@ -165,28 +173,33 @@ private static void hbCreateDecodeTables(final int[]
limit, final int[] base, fi
}
}
}
- for (int i = MAX_CODE_LEN; --i > 0;) {
- base[i] = 0;
- limit[i] = 0;
- }
+ // Ensure the arrays were not reused.
+ Arrays.fill(base, 0);
+ Arrays.fill(limit, minLen, maxLen + 1, 0);
+ // Compute histogram of code lengths, shifted by 1.
for (int i = 0; i < alphaSize; i++) {
final int len = length[i] + 1;
- checkBounds(len, MAX_ALPHA_SIZE, "length");
base[len]++;
}
- for (int i = 1, b = base[0]; i < MAX_CODE_LEN; i++) {
- b += base[i];
- base[i] = b;
- }
- for (int i = minLen, vec = 0, b = base[i]; i <= maxLen; i++) {
- final int nb = base[i + 1];
- vec += nb - b;
- b = nb;
- limit[i] = vec - 1;
+ // Compute cumulative counts: base[len] = # of codes with length < len.
+ // In other terms: base[len] = index of the first code in the `perm`
table.
+ for (int len = 1; len < base.length; len++) {
+ base[len] += base[len - 1];
+ }
+ // Compute the last code for each length.
+ int vec = 0;
+ for (int len = minLen; len <= maxLen; len++) {
+ // increment by the number of length `len` codes
+ vec += base[len + 1] - base[len];
+ // vec is now the last code of length `len` + 1
+ limit[len] = vec - 1;
vec <<= 1;
}
- for (int i = minLen + 1; i <= maxLen; i++) {
- base[i] = (limit[i - 1] + 1 << 1) - base[i];
+ // Compute the bias between code value and table index.
+ // base[minLen] cannot be computed using this rule, since limit[minLen
- 1] does not exist,
+ // but has already the correct value 0.
+ for (int len = minLen + 1; len <= maxLen; len++) {
+ base[len] = (limit[len - 1] + 1 << 1) - base[len];
}
}
@@ -220,7 +233,6 @@ public static boolean matches(final byte[] signature, final
int length) {
private boolean blockRandomised;
private final CRC crc = new CRC();
- private int nInUse;
private BitInputStream bin;
private final boolean decompressConcatenated;
private int currentState = START_BLOCK_STATE;
@@ -296,10 +308,15 @@ private boolean complete() throws IOException {
}
/**
- * Called by recvDecodingTables() exclusively.
+ * Builds the Huffman decoding tables for use by {@code
recvDecodingTables()}.
+ *
+ * @param alphaSize the alphabet size, guaranteed by the caller to be in
the range [2, 258]
+ * (RUNA, RUNB, 255 byte values, and EOB).
+ * @param nGroups the number of Huffman coding groups, guaranteed by the
caller to be in the range [0, 6].
+ * @param dataShadow the data structure into which the tables are built;
requires
+ * {@code temp_charArray2d} to be initialized.
*/
- private void createHuffmanDecodingTables(final int alphaSize, final int
nGroups) throws IOException {
- final Data dataShadow = this.data;
+ static void createHuffmanDecodingTables(final int alphaSize, final int
nGroups, final Data dataShadow) {
final char[][] len = dataShadow.temp_charArray2d;
final int[] minLens = dataShadow.minLens;
final int[][] limit = dataShadow.limit;
@@ -307,10 +324,10 @@ private void createHuffmanDecodingTables(final int
alphaSize, final int nGroups)
final int[][] perm = dataShadow.perm;
for (int t = 0; t < nGroups; t++) {
- int minLen = 32;
- int maxLen = 0;
final char[] len_t = len[t];
- for (int i = alphaSize; --i >= 0;) {
+ int minLen = len_t[0];
+ int maxLen = len_t[0];
+ for (int i = 1; i < alphaSize; i++) {
final char lent = len_t[i];
if (lent > maxLen) {
maxLen = lent;
@@ -341,8 +358,8 @@ private void endBlock() throws IOException {
private void getAndMoveToFrontDecode() throws IOException {
final BitInputStream bin = this.bin;
this.origPtr = bsR(bin, 24);
- recvDecodingTables();
final Data dataShadow = this.data;
+ recvDecodingTables(bin, dataShadow);
final byte[] ll8 = dataShadow.ll8;
final int[] unzftab = dataShadow.unzftab;
final byte[] selector = dataShadow.selector;
@@ -363,11 +380,12 @@ private void getAndMoveToFrontDecode() throws IOException
{
}
int groupNo = 0;
int groupPos = G_SIZE - 1;
- final int eob = this.nInUse + 1;
+ final int eob = dataShadow.inUseCount + 1;
int nextSym = getAndMoveToFrontDecode0();
int lastShadow = -1;
int zt = selector[groupNo] & 0xff;
- checkBounds(zt, N_GROUPS, "zt");
+ // All arrays have the same length
+ checkBounds(zt, base.length, "zt");
int[] base_zt = base[zt];
int[] limit_zt = limit[zt];
int[] perm_zt = perm[zt];
@@ -385,9 +403,10 @@ private void getAndMoveToFrontDecode() throws IOException {
}
if (groupPos == 0) {
groupPos = G_SIZE - 1;
- checkBounds(++groupNo, MAX_SELECTORS, "groupNo");
+ checkBounds(++groupNo, selector.length, "groupNo");
zt = selector[groupNo] & 0xff;
- checkBounds(zt, N_GROUPS, "zt");
+ // All arrays have the same length
+ checkBounds(zt, base.length, "zt");
base_zt = base[zt];
limit_zt = limit[zt];
perm_zt = perm[zt];
@@ -396,19 +415,19 @@ private void getAndMoveToFrontDecode() throws IOException
{
groupPos--;
}
int zn = minLens_zt;
- checkBounds(zn, MAX_ALPHA_SIZE, "zn");
+ checkBounds(zn, limit_zt.length, "zn");
int zvec = bsR(bin, zn);
while (zvec > limit_zt[zn]) {
- checkBounds(++zn, MAX_ALPHA_SIZE, "zn");
+ checkBounds(++zn, limit_zt.length, "zn");
zvec = zvec << 1 | bsR(bin, 1);
}
final int tmp = zvec - base_zt[zn];
- checkBounds(tmp, MAX_ALPHA_SIZE, "zvec");
+ checkBounds(tmp, perm_zt.length, "zvec");
nextSym = perm_zt[tmp];
}
checkBounds(s, this.data.ll8.length, "s");
final int yy0 = yy[0];
- checkBounds(yy0, 256, "yy");
+ checkBounds(yy0, seqToUnseq.length, "yy");
final byte ch = seqToUnseq[yy0];
unzftab[ch & 0xff] += s + 1;
final int from = ++lastShadow;
@@ -422,9 +441,9 @@ private void getAndMoveToFrontDecode() throws IOException {
if (++lastShadow >= limitLast) {
throw new CompressorException("Block overrun in MTF, %,d
exceeds %,d", lastShadow, limitLast);
}
- checkBounds(nextSym, 256 + 1, "nextSym");
+ checkBounds(nextSym - 1, yy.length, "nextSym");
final char tmp = yy[nextSym - 1];
- checkBounds(tmp, 256, "yy");
+ checkBounds(tmp, seqToUnseq.length, "yy");
unzftab[seqToUnseq[tmp] & 0xff]++;
ll8[lastShadow] = seqToUnseq[tmp];
/*
@@ -440,9 +459,10 @@ private void getAndMoveToFrontDecode() throws IOException {
yy[0] = tmp;
if (groupPos == 0) {
groupPos = G_SIZE - 1;
- checkBounds(++groupNo, MAX_SELECTORS, "groupNo");
+ checkBounds(++groupNo, selector.length, "groupNo");
zt = selector[groupNo] & 0xff;
- checkBounds(zt, N_GROUPS, "zt");
+ // All arrays have the same length
+ checkBounds(zt, base.length, "zt");
base_zt = base[zt];
limit_zt = limit[zt];
perm_zt = perm[zt];
@@ -451,14 +471,14 @@ private void getAndMoveToFrontDecode() throws IOException
{
groupPos--;
}
int zn = minLens_zt;
- checkBounds(zn, MAX_ALPHA_SIZE, "zn");
+ checkBounds(zn, limit_zt.length, "zn");
int zvec = bsR(bin, zn);
while (zvec > limit_zt[zn]) {
- checkBounds(++zn, MAX_ALPHA_SIZE, "zn");
+ checkBounds(++zn, limit_zt.length, "zn");
zvec = zvec << 1 | bsR(bin, 1);
}
final int idx = zvec - base_zt[zn];
- checkBounds(idx, MAX_ALPHA_SIZE, "zvec");
+ checkBounds(idx, perm_zt.length, "zvec");
nextSym = perm_zt[idx];
}
}
@@ -468,17 +488,17 @@ private void getAndMoveToFrontDecode() throws IOException
{
private int getAndMoveToFrontDecode0() throws IOException {
final Data dataShadow = this.data;
final int zt = dataShadow.selector[0] & 0xff;
- checkBounds(zt, N_GROUPS, "zt");
+ checkBounds(zt, dataShadow.limit.length, "zt");
final int[] limit_zt = dataShadow.limit[zt];
int zn = dataShadow.minLens[zt];
- checkBounds(zn, MAX_ALPHA_SIZE, "zn");
+ checkBounds(zn, limit_zt.length, "zn");
int zvec = bsR(bin, zn);
while (zvec > limit_zt[zn]) {
- checkBounds(++zn, MAX_ALPHA_SIZE, "zn");
+ checkBounds(++zn, limit_zt.length, "zn");
zvec = zvec << 1 | bsR(bin, 1);
}
final int tmp = zvec - dataShadow.base[zt][zn];
- checkBounds(tmp, MAX_ALPHA_SIZE, "zvec");
+ checkBounds(tmp, dataShadow.perm[zt].length, "zvec");
return dataShadow.perm[zt][tmp];
}
@@ -573,9 +593,9 @@ private void initBlock() throws IOException {
this.currentState = START_BLOCK_STATE;
}
- private void makeMaps() {
- final boolean[] inUse = this.data.inUse;
- final byte[] seqToUnseq = this.data.seqToUnseq;
+ private static void makeMaps(final Data data) throws IOException {
+ final boolean[] inUse = data.inUse;
+ final byte[] seqToUnseq = data.seqToUnseq;
int nInUseShadow = 0;
@@ -585,7 +605,7 @@ private void makeMaps() {
}
}
- this.nInUse = nInUseShadow;
+ data.inUseCount = nInUseShadow;
}
@Override
@@ -668,9 +688,7 @@ private int readNextByte(final BitInputStream in) throws
IOException {
return (int) b;
}
- private void recvDecodingTables() throws IOException {
- final BitInputStream bin = this.bin;
- final Data dataShadow = this.data;
+ static void recvDecodingTables(final BitInputStream bin, final Data
dataShadow) throws IOException {
final boolean[] inUse = dataShadow.inUse;
final byte[] pos = dataShadow.recvDecodingTables_pos;
final byte[] selector = dataShadow.selector;
@@ -697,8 +715,8 @@ private void recvDecodingTables() throws IOException {
}
}
- makeMaps();
- final int alphaSize = this.nInUse + 2;
+ makeMaps(dataShadow);
+ final int alphaSize = dataShadow.inUseCount + 2;
/* Now the selectors */
final int nGroups = bsR(bin, 3);
final int selectors = bsR(bin, 15);
@@ -751,12 +769,17 @@ private void recvDecodingTables() throws IOException {
while (bsGetBit(bin)) {
curr += bsGetBit(bin) ? -1 : 1;
}
+ // Same condition as in bzip2
+ if (curr < 1 || curr > MAX_CODE_LEN) {
+ throw new CompressorException(
+ "Corrupted input, code length value out of range
[%d, %d]: %d", 1, MAX_CODE_LEN, curr);
+ }
len_t[i] = (char) curr;
}
}
// finally create the Huffman tables
- createHuffmanDecodingTables(alphaSize, nGroups);
+ createHuffmanDecodingTables(alphaSize, nGroups, dataShadow);
}
private int setupBlock() throws IOException {
@@ -766,6 +789,7 @@ private int setupBlock() throws IOException {
final int[] cftab = this.data.cftab;
final int ttLen = this.last + 1;
+ // tt has size at least ttLen
final int[] tt = this.data.initTT(ttLen);
final byte[] ll8 = this.data.ll8;
cftab[0] = 0;
diff --git
a/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2Constants.java
b/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2Constants.java
index 9350d2100..592b0b527 100644
---
a/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2Constants.java
+++
b/src/main/java/org/apache/commons/compress/compressors/bzip2/BZip2Constants.java
@@ -35,9 +35,19 @@ interface BZip2Constants {
int MAX_ALPHA_SIZE = 258;
/**
- * Constant {@value}.
+ * Maximum allowed length of a Huffman code in BZip2.
+ * <p>
+ * In the original bzip2 C implementation, a defensive limit of 23 is
set to prevent buffer overflows
+ * (see <a
href="https://github.com/libarchive/bzip2/blob/master/bzlib_private.h">{@code
bzip_private.h}</a>).
+ * However, the actual enforced maximum during decompression is 20
+ * (see <a
href="https://github.com/libarchive/bzip2/blob/master/decompress.c">BZ2_decompress
in {@code decompress.c}</a>),
+ * and since version 1.0.6, compression uses a maximum of 17.
+ * </p>
+ * <p>
+ * The Java implementation does not require the higher defensive
limit, so this constant is set to the true maximum of 20.
+ * </p>
*/
- int MAX_CODE_LEN = 23;
+ int MAX_CODE_LEN = 20;
/**
* Constant {@value}.
diff --git
a/src/test/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStreamTest.java
b/src/test/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStreamTest.java
index 778832668..28fb2a831 100644
---
a/src/test/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStreamTest.java
+++
b/src/test/java/org/apache/commons/compress/compressors/bzip2/BZip2CompressorInputStreamTest.java
@@ -18,7 +18,9 @@
*/
package org.apache.commons.compress.compressors.bzip2;
+import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
@@ -27,6 +29,7 @@
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
+import java.nio.ByteOrder;
import java.nio.file.Files;
import java.nio.file.Paths;
@@ -35,11 +38,18 @@
import org.apache.commons.compress.archivers.ArchiveInputStream;
import org.apache.commons.compress.archivers.ArchiveStreamFactory;
import org.apache.commons.compress.compressors.CompressorException;
+import
org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream.Data;
+import org.apache.commons.compress.utils.BitInputStream;
import org.apache.commons.io.IOUtils;
import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.ValueSource;
class BZip2CompressorInputStreamTest extends AbstractTest {
+ private static final int MIN_CODE_LEN = 1;
+ private static final int MAX_CODE_LEN = 20;
+
private void fuzzingTest(final int[] bytes) throws IOException,
ArchiveException {
final int len = bytes.length;
final byte[] input = new byte[len];
@@ -149,4 +159,101 @@ void testSingleByteReadConsistentlyReturnsMinusOneAtEof()
throws IOException {
assertEquals(-1, in.read());
}
}
+
+ @Test
+ void testCreateHuffmanDecodingTablesWithLargeAlphaSize() {
+ final Data data = new Data(1);
+ // Use a codeLengths array with length equal to MAX_ALPHA_SIZE (258)
to test array bounds.
+ final char[] codeLengths = new char[258];
+ for (int i = 0; i < codeLengths.length; i++) {
+ // Use all code lengths within valid range [1, 20]
+ codeLengths[i] = (char) ((i % MAX_CODE_LEN) + 1);
+ }
+ data.temp_charArray2d[0] = codeLengths;
+ assertDoesNotThrow(
+ () ->
BZip2CompressorInputStream.createHuffmanDecodingTables(codeLengths.length, 1,
data),
+ "createHuffmanDecodingTables should not throw for valid
codeLengths array of MAX_ALPHA_SIZE");
+ assertEquals(data.minLens[0], 1, "Minimum code length should be 1");
+ }
+
+ @ParameterizedTest(name = "code length {0} -> must be rejected")
+ @ValueSource(ints = {MIN_CODE_LEN - 1, MAX_CODE_LEN + 1})
+ void testRecvDecodingTablesWithOutOfRangeCodeLength(final int codeLength)
throws IOException {
+ try (BitInputStream tables = prepareDecodingTables(codeLength)) {
+ final Data data = new Data(1);
+
+ final CompressorException ex = assertThrows(
+ CompressorException.class,
+ () ->
BZip2CompressorInputStream.recvDecodingTables(tables, data),
+ "Expected CompressorException for invalid code length " +
codeLength);
+
+ final String msg = ex.getMessage();
+ assertNotNull(msg, "Exception message must not be null");
+ assertTrue(msg.toLowerCase().contains("code length"), "Message
should mention 'code length'");
+ assertTrue(
+ msg.contains("[" + MIN_CODE_LEN + ", " + MAX_CODE_LEN +
"]"),
+ "Message should mention valid range [" + MIN_CODE_LEN + ",
" + MAX_CODE_LEN + "]");
+ assertTrue(
+ msg.contains(Integer.toString(codeLength)),
+ "Message should include the offending value " +
codeLength);
+ }
+ }
+
+ @ParameterizedTest(name = "code length {0} -> accepted and stored")
+ @ValueSource(ints = {MIN_CODE_LEN, MAX_CODE_LEN})
+ void testRecvDecodingTablesWithValidCodeLength(final int codeLength)
throws IOException {
+ try (BitInputStream tables = prepareDecodingTables(codeLength)) {
+ final Data data = new Data(1);
+
+ assertDoesNotThrow(
+ () ->
BZip2CompressorInputStream.recvDecodingTables(tables, data),
+ "Should accept code length " + codeLength + " within [" +
MIN_CODE_LEN + ", " + MAX_CODE_LEN + "]");
+
+ // We encoded 2 Huffman groups; both minLens should equal the
encoded codeLength
+ assertEquals(codeLength, data.minLens[0], "Group 0 min code length
mismatch");
+ assertEquals(codeLength, data.minLens[1], "Group 1 min code length
mismatch");
+ }
+ }
+
+ /**
+ * Builds a minimal bitstream for recvDecodingTables():
+ * <ul>
+ * <li>Uses only one symbol 'A' (0x41).</li>
+ * <li>Number of groups: 2 (minimum).</li>
+ * <li>Number of selectors: 3.</li>
+ * <li>Selectors: all three encode j=1 (unary "10").</li>
+ * <li>Huffman code lengths for 2 groups over alphabet size 3 (RUNA,
RUNB, EOB) are all equal to {@code codeLength}.</li>
+ * </ul>
+ * <p>
+ * <strong>Note:</strong> The values are chosen to keep everything
byte-aligned.
+ * </p>
+ * @param codeLength the code length to use for each symbol in each group;
must be in [0, 31]
+ */
+ private BitInputStream prepareDecodingTables(final int codeLength) {
+ assertTrue(0 <= codeLength && codeLength <= 31, "codeLength must be
between 0 and 31");
+
+ final ByteArrayOutputStream stream = new ByteArrayOutputStream();
+
+ // Upper nibbles in use: set only upper nibble 4
+ stream.write(0b0000_1000);
+ stream.write(0b0000_0000);
+
+ // Lower nibbles of upper nibble 4: set only lower nibble 1
+ stream.write(0b0100_0000);
+ stream.write(0b0000_0000);
+
+ // Groups/selectors (24 bits total, byte-aligned)
+ // nGroups = 2 (3 bits), nSelectors = 3 (15 bits), selectors: 10 10 10
+ stream.write(0b010_00000); // nGroups (3 bits) + high 5 bits of
nSelectors
+ stream.write(0b00000000); // middle 8 bits of nSelectors
+ stream.write(0b11_10_10_10); // low 2 bits of nSelectors + selectors
(3 x 2 bits)
+
+ // Huffman tables: two groups, three symbols each
+ // startLen (5 bits) followed by 3x '0' (done) => one byte: codeLength
<< 3
+ stream.write(codeLength << 3);
+ stream.write(codeLength << 3);
+
+ return new BitInputStream(new
ByteArrayInputStream(stream.toByteArray()), ByteOrder.BIG_ENDIAN);
+ }
+
}