Ruiqi Dong created CODEC-341:
--------------------------------

             Summary: Base16.Builder#setEncodeTable(...) can create an instance 
that cannot decode its own output
                 Key: CODEC-341
                 URL: https://issues.apache.org/jira/browse/CODEC-341
             Project: Commons Codec
          Issue Type: Bug
            Reporter: Ruiqi Dong


*Summary*
Base16.Builder exposes setEncodeTable(...), which suggests callers can provide 
a custom Base16 alphabet. Encoding does honor the custom table, but the builder 
only switches the decode table between the built-in upper-case and lower-case 
variants. As a result, a Base16 instance created with an arbitrary custom 
alphabet can emit encoded data that the same instance decodes incorrectly. 
*This issue also happens on Base32.* BTW, is that fine for me to report Base32 
in this ticket? Or do I need to create a new ticket?
 
*Affected code*
File: src/main/java/org/apache/commons/codec/binary/Base16.java
File: src/main/java/org/apache/commons/codec/binary/Base32.java
{code:java}
# Base16
@Override
public Builder setEncodeTable(final byte... encodeTable) {
    super.setDecodeTableRaw(Arrays.equals(encodeTable, LOWER_CASE_ENCODE_TABLE) 
? LOWER_CASE_DECODE_TABLE : UPPER_CASE_DECODE_TABLE);
    return super.setEncodeTable(encodeTable);
}{code}
{code:java}
# Base32
@Override
public Builder setEncodeTable(final byte... encodeTable) {
    super.setDecodeTableRaw(Arrays.equals(encodeTable, HEX_ENCODE_TABLE) ? 
HEX_DECODE_TABLE : DECODE_TABLE);
    return super.setEncodeTable(encodeTable);
} {code}
 
*Reproducer* 
Add the following test to 
src/test/java/org/apache/commons/codec/binary/Base16Test.java:
{code:java}
@Test
void testBuilderCustomEncodeTableAffectsDecodeTable() {
    final byte[] encodeTable = 
"0123456789ABCDEF".getBytes(StandardCharsets.US_ASCII);
    final byte tmp = encodeTable[0];
    encodeTable[0] = encodeTable[1];
    encodeTable[1] = tmp;

    final Base16 base16 = Base16.builder().setEncodeTable(encodeTable).get();
    final byte[] encoded = base16.encode(new byte[] { 1 });
    assertEquals("10", new String(encoded, StandardCharsets.US_ASCII),
            "A custom Base16 alphabet should affect encoding");
    assertArrayEquals(new byte[] { 1 }, base16.decode(encoded),
            "A custom Base16 alphabet should decode its own encoded output");
}{code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.codec.binary.Base16Test#testBuilderCustomEncodeTableAffectsDecodeTable
 test {code}
The encoding assertion passes, showing that the custom alphabet is used. The 
encoded output is:
{code:java}
10{code}
But the decode assertion fails because 10 is interpreted with the default 
decode table:
{code:java}
array contents differ at index [0], expected: <1> but was: <16> {code}
Expected behavior:
If setEncodeTable(...) is part of the public builder API, the resulting Base16 
instance should use a matching decode table so that it can decode its own 
output consistently. If arbitrary custom alphabets are not supported, the 
builder should reject them instead of silently pairing them with an 
incompatible decode table.
Add the following test to 
src/test/java/org/apache/commons/codec/binary/Base32Test.java:
{code:java}
@Test
void testBuilderCustomEncodeTableAffectsDecodeTable() {
    final byte[] encodeTable = 
"ABCDEFGHIJKLMNOPQRSTUVWXYZ234567".getBytes(StandardCharsets.US_ASCII);
    final byte tmp = encodeTable[0];
    encodeTable[0] = encodeTable[1];
    encodeTable[1] = tmp;

    final Base32 base32 = 
Base32.builder().setEncodeTable(encodeTable).setLineLength(0).get();
    final byte[] encoded = base32.encode(new byte[] { 0 });
    assertEquals("BB======", new String(encoded, StandardCharsets.US_ASCII),
            "A custom Base32 alphabet should affect encoding");
    assertArrayEquals(new byte[] { 0 }, base32.decode(encoded),
            "A custom Base32 alphabet should decode its own encoded output");
} {code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.codec.binary.Base32Test#testBuilderCustomEncodeTableAffectsDecodeTable
 test {code}
Observed behavior:
The encoding assertion passes, showing that the custom alphabet is used. The 
encoded output is:
{code:java}
BB====== {code}
But the decode assertion fails because "BB======" is interpreted with the 
default decode table:
{code:java}
array contents differ at index [0], expected: <0> but was: <8> {code}
Expected behavior:
The resulting Base32 instance should use a matching decode table so that it can 
decode its own output consistently. If arbitrary custom alphabets are not 
supported, the builder should reject them instead of silently pairing them with 
an incompatible decode table.
 
 
This is a configuration/state inconsistency in a public API. The builder 
accepts a custom alphabet and encoding follows that configuration, but decoding 
silently continues to interpret characters under a different alphabet. That 
makes the configured Base16 and Base32 instances internally inconsistent.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to