Ruiqi Dong created CODEC-341:
--------------------------------
Summary: Base16.Builder#setEncodeTable(...) can create an instance
that cannot decode its own output
Key: CODEC-341
URL: https://issues.apache.org/jira/browse/CODEC-341
Project: Commons Codec
Issue Type: Bug
Reporter: Ruiqi Dong
*Summary*
Base16.Builder exposes setEncodeTable(...), which suggests callers can provide
a custom Base16 alphabet. Encoding does honor the custom table, but the builder
only switches the decode table between the built-in upper-case and lower-case
variants. As a result, a Base16 instance created with an arbitrary custom
alphabet can emit encoded data that the same instance decodes incorrectly.
*This issue also happens on Base32.* BTW, is that fine for me to report Base32
in this ticket? Or do I need to create a new ticket?
*Affected code*
File: src/main/java/org/apache/commons/codec/binary/Base16.java
File: src/main/java/org/apache/commons/codec/binary/Base32.java
{code:java}
# Base16
@Override
public Builder setEncodeTable(final byte... encodeTable) {
super.setDecodeTableRaw(Arrays.equals(encodeTable, LOWER_CASE_ENCODE_TABLE)
? LOWER_CASE_DECODE_TABLE : UPPER_CASE_DECODE_TABLE);
return super.setEncodeTable(encodeTable);
}{code}
{code:java}
# Base32
@Override
public Builder setEncodeTable(final byte... encodeTable) {
super.setDecodeTableRaw(Arrays.equals(encodeTable, HEX_ENCODE_TABLE) ?
HEX_DECODE_TABLE : DECODE_TABLE);
return super.setEncodeTable(encodeTable);
} {code}
*Reproducer*
Add the following test to
src/test/java/org/apache/commons/codec/binary/Base16Test.java:
{code:java}
@Test
void testBuilderCustomEncodeTableAffectsDecodeTable() {
final byte[] encodeTable =
"0123456789ABCDEF".getBytes(StandardCharsets.US_ASCII);
final byte tmp = encodeTable[0];
encodeTable[0] = encodeTable[1];
encodeTable[1] = tmp;
final Base16 base16 = Base16.builder().setEncodeTable(encodeTable).get();
final byte[] encoded = base16.encode(new byte[] { 1 });
assertEquals("10", new String(encoded, StandardCharsets.US_ASCII),
"A custom Base16 alphabet should affect encoding");
assertArrayEquals(new byte[] { 1 }, base16.decode(encoded),
"A custom Base16 alphabet should decode its own encoded output");
}{code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.codec.binary.Base16Test#testBuilderCustomEncodeTableAffectsDecodeTable
test {code}
The encoding assertion passes, showing that the custom alphabet is used. The
encoded output is:
{code:java}
10{code}
But the decode assertion fails because 10 is interpreted with the default
decode table:
{code:java}
array contents differ at index [0], expected: <1> but was: <16> {code}
Expected behavior:
If setEncodeTable(...) is part of the public builder API, the resulting Base16
instance should use a matching decode table so that it can decode its own
output consistently. If arbitrary custom alphabets are not supported, the
builder should reject them instead of silently pairing them with an
incompatible decode table.
Add the following test to
src/test/java/org/apache/commons/codec/binary/Base32Test.java:
{code:java}
@Test
void testBuilderCustomEncodeTableAffectsDecodeTable() {
final byte[] encodeTable =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567".getBytes(StandardCharsets.US_ASCII);
final byte tmp = encodeTable[0];
encodeTable[0] = encodeTable[1];
encodeTable[1] = tmp;
final Base32 base32 =
Base32.builder().setEncodeTable(encodeTable).setLineLength(0).get();
final byte[] encoded = base32.encode(new byte[] { 0 });
assertEquals("BB======", new String(encoded, StandardCharsets.US_ASCII),
"A custom Base32 alphabet should affect encoding");
assertArrayEquals(new byte[] { 0 }, base32.decode(encoded),
"A custom Base32 alphabet should decode its own encoded output");
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.codec.binary.Base32Test#testBuilderCustomEncodeTableAffectsDecodeTable
test {code}
Observed behavior:
The encoding assertion passes, showing that the custom alphabet is used. The
encoded output is:
{code:java}
BB====== {code}
But the decode assertion fails because "BB======" is interpreted with the
default decode table:
{code:java}
array contents differ at index [0], expected: <0> but was: <8> {code}
Expected behavior:
The resulting Base32 instance should use a matching decode table so that it can
decode its own output consistently. If arbitrary custom alphabets are not
supported, the builder should reject them instead of silently pairing them with
an incompatible decode table.
This is a configuration/state inconsistency in a public API. The builder
accepts a custom alphabet and encoding follows that configuration, but decoding
silently continues to interpret characters under a different alphabet. That
makes the configured Base16 and Base32 instances internally inconsistent.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)