Ruiqi Dong created CODEC-344:
--------------------------------
Summary: Base64.Builder#setEncodeTable(...) accepts invalid custom
alphabets
Key: CODEC-344
URL: https://issues.apache.org/jira/browse/CODEC-344
Project: Commons Codec
Issue Type: Bug
Reporter: Ruiqi Dong
*Summary*
`Base64.Builder#setEncodeTable(...)` accepts invalid custom 64-byte encode
tables without complete validation. This is not the same issue as the Base58
custom alphabet bug: Base64 does use its configured encode/decode tables for
instance operations.
However, invalid Base64 alphabets can still create broken codecs or throw the
wrong exception type:
1. Duplicate alphabet bytes are accepted silently, and the codec can no longer
decode its own output correctly.
2. A non-ASCII byte (`>= 0x80`) throws `ArrayIndexOutOfBoundsException` while
building the decode table.
3. An alphabet containing the configured padding byte is accepted silently, and
encoded data can be interpreted as padding during decode.
*Affected code*
`org.apache.commons.codec.binary.Base64`
`calculateDecodeTable(...)` builds the custom decode table without duplicate
checking or unsigned byte indexing:
{code:java}
private static byte[] calculateDecodeTable(final byte[] encodeTable) {
final byte[] decodeTable = new byte[DECODING_TABLE_LENGTH];
Arrays.fill(decodeTable, (byte) -1);
for (int i = 0; i < encodeTable.length; i++) {
decodeTable[encodeTable[i]] = (byte) i;
}
return decodeTable;
} {code}
`setEncodeTable(...)` derives a decode table, but it does not reject duplicate
entries or entries that conflict with the configured padding byte:
{code:java}
@Override
public Builder setEncodeTable(final byte... encodeTable) {
final boolean isStandardEncodeTable = Arrays.equals(encodeTable,
STANDARD_ENCODE_TABLE);
final boolean isUrlSafe = Arrays.equals(encodeTable, URL_SAFE_ENCODE_TABLE);
setDecodeTableRaw(isStandardEncodeTable || isUrlSafe ? DECODE_TABLE :
calculateDecodeTable(encodeTable));
return super.setEncodeTable(encodeTable);
} {code}
The padding-byte check needs to happen when the final `Base64` instance is
built, because padding is configurable and the builder setter order can vary.
*Reproducer*
The following regression tests describe the expected behavior:
{code:java}
@Test
void testBuilderCustomEncodeTableRejectsDuplicateEntries() {
final byte[] encodeTable =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
.getBytes(StandardCharsets.US_ASCII);
encodeTable[1] = encodeTable[0];
assertThrows(IllegalArgumentException.class, () ->
Base64.builder().setEncodeTable(encodeTable));
}
@Test
void testBuilderCustomEncodeTableRejectsNonAsciiEntries() {
final byte[] encodeTable =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
.getBytes(StandardCharsets.US_ASCII);
encodeTable[0] = (byte) 0x80;
// Assumes the "reject non-ASCII" direction. If non-ASCII support is chosen
instead,
// assert a successful round-trip rather than IllegalArgumentException.
assertThrows(IllegalArgumentException.class, () ->
Base64.builder().setEncodeTable(encodeTable));
}
@Test
void testBuilderCustomEncodeTableRejectsPaddingByte() {
final byte[] encodeTable =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
.getBytes(StandardCharsets.US_ASCII);
encodeTable[0] = '=';
assertThrows(IllegalArgumentException.class, () ->
Base64.builder().setEncodeTable(encodeTable).get());
} {code}
*Observed behavior*
Duplicate alphabet byte:
{code:java}
duplicate[1] = duplicate[0]
encode({0}) -> "AA"
decode("AA") -> [4]
round-trip -> false {code}
Non-ASCII alphabet byte:
{code:java}
nonAscii[0] = (byte) 0x80
Base64.builder().setEncodeTable(nonAscii).get()
-> java.lang.ArrayIndexOutOfBoundsException: Index -128 out of bounds for
length 256 {code}
Padding byte in alphabet:
{code:java}
pad[0] = '='
encode({0}) -> "=="
decode("==") -> []
round-trip -> false {code}
*Expected behavior*
`Base64.Builder#setEncodeTable(...)` should reject invalid custom alphabets
with a clear `IllegalArgumentException`.
At minimum, the encode table should:
1. Have exactly 64 entries.
2. Contain no duplicate byte values.
3. Not contain the final configured padding byte.
4. Either reject non-ASCII byte values clearly, or fully support them using
unsigned lookup consistently in decode-table construction, decode, and
`isInAlphabet`.
For consistency with the sibling Base classes, rejecting invalid alphabets
during builder configuration or final instance construction is preferable to
silently building a codec that corrupts round-trips.
Base64 exposes a public builder API for custom alphabets. The resulting codec
should either be valid and self-consistent, or reject invalid configuration. It
should not silently produce a codec that cannot decode its own output, and it
should not fail with an internal `ArrayIndexOutOfBoundsException` for invalid
alphabet input.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)