Ruiqi Dong created CODEC-343:
--------------------------------
Summary: Base32.Builder#setHexDecodeTable(boolean) sets the encode
table to the decode table, corrupting encoding
Key: CODEC-343
URL: https://issues.apache.org/jira/browse/CODEC-343
Project: Commons Codec
Issue Type: Bug
Reporter: Ruiqi Dong
*Summary*
`Base32.Builder#setHexDecodeTable(boolean)` is implemented as
`setEncodeTable(decodeTable(useHex))` — it passes the **decode** lookup table
to `setEncodeTable(...)`. Used on its own, the resulting `Base32` therefore
encodes with the decode lookup array (whose low entries are the `-1` sentinel)
instead of the Base32-Hex alphabet, so it emits bytes outside the alphabet and
cannot decode its own output.
*Affected code*File: `src/main/java/org/apache/commons/codec/binary/Base32.java`
{code:java}
public Builder setHexDecodeTable(final boolean useHex) {
return setEncodeTable(decodeTable(useHex)); // passes the DECODE table to
setEncodeTable
} {code}
`decodeTable(useHex)` returns `HEX_DECODE_TABLE` / `DECODE_TABLE` (the lookup
arrays used for decoding). Passing one of those to `setEncodeTable(...)` makes
it the encode table, so encoding reads `-1` sentinels and emits invalid bytes.
The only test that touches it chains another setter right after:
{code:java}
Base32.builder()
.setHexDecodeTable(false)
.setHexDecodeTable(true)
.setHexEncodeTable(false)
.setHexEncodeTable(true) // "last set wins" overwrites the broken
encode table
... {code}
The trailing `setHexEncodeTable(true)` restores a correct encode table, masking
the defect, so the bug never surfaces when `setHexDecodeTable` is used in
isolation.
*Reproducer*
Add the following test to
`src/test/java/org/apache/commons/codec/binary/Base32Test.java`:
{code:java}
@Test
void testBuilderSetHexDecodeTableEncodesWithHexAlphabet() {
final Base32 base32 =
Base32.builder().setHexDecodeTable(true).setLineLength(0).get();
final byte[] data = { 0 };
final byte[] encoded = base32.encode(data);
assertEquals("00======", new String(encoded, StandardCharsets.US_ASCII),
"setHexDecodeTable(true) should encode with the Base32-Hex
alphabet");
assertArrayEquals(data, base32.decode(encoded),
"the instance should decode its own output");
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.codec.binary.Base32Test#testBuilderSetHexDecodeTableEncodesWithHexAlphabet
test {code}
*Observed behavior*
Encoding `\{ 0 }` does not produce the Base32-Hex form `"00======"`. The
encoder emits the `-1` sentinel from the decode table as `0xFF`:
{code:java}
encode({0}) -> bytes [-1, -1, 61, 61, 61, 61, 61, 61] // 0xFF 0xFF ======
decode(...) -> [] // round-trip lost
{code}
So the encoding assertion fails and the instance cannot decode its own output.
*Expected behavior*
`setHexDecodeTable(true)` should configure a `Base32` that encodes with the
Base32-Hex alphabet and decodes its own output. It must set the encode table:
{code:java}
public Builder setHexDecodeTable(final boolean useHex) {
return setEncodeTable(encodeTable(useHex));
} {code}
`setHexDecodeTable(...)` is a public builder API (`@since 1.18.0`). When used
on its own — the natural way to select the Base32-Hex variant — it produces an
instance that emits non-alphabet bytes and corrupts data, because the encode
and decode tables are crossed.
Same family as the custom-alphabet decode mismatch in `Base16.Builder`
(CODEC-341 [https://issues.apache.org/jira/browse/CODEC-341]) and
`Base32.Builder` (CODEC-342 [https://issues.apache.org/jira/browse/CODEC-342]).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)