Alex Herbert created CODEC-280:
----------------------------------
Summary: Base32/64 to allow optional strict/lenient decoding
Key: CODEC-280
URL: https://issues.apache.org/jira/browse/CODEC-280
Project: Commons Codec
Issue Type: Improvement
Affects Versions: 1.14
Reporter: Alex Herbert
Assignee: Alex Herbert
Base32 decodes blocks of 8 characters.
Base64 decodes blocks of 4 characters.
At the end of decoding some extra characters may be left. They are decoded
using the appropriate bits. The bits that do not sum to form a byte (i.e. less
than 8 bits) are discarded.
Currently if there are more than 8 bits left then the available bytes are
extracted and the left over bits are validated to check they are zeros. If they
are not zeros then an exception is raised. This functionality was added to
ensure that a byte array that is decoded will be re-encoded to the exact same
byte array (ignoring input padding).
There are two issues:
# If the leftover bits are less than 8 then no attempt can be made to obtain
the last bytes. However an exception is not raised indicating that the encoding
was invalid (no left-over bits should be unaccounted for).
# This raising of exceptions for leftover bits is causing reports from users
that codec is not working as it used to. This is true but only because the user
has some badly encoded bytes they want to decode. Since other libraries allow
this then it seems that two options for decoding are required.
I suggest fixing the encoding so that it operates in two modes: strict and
lenient.
* Strict will throw an exception whenever there are unaccounted for bits.
* Lenient will just discard the extra bits that cannot be used.
Lenient is the default for backward compatibility restoring functionality of
the class to versions prior to 1.13.
Strict is enabled using a method:
{code:java}
Base64 codec = new Base64();
byte[] bytes = new byte{ 'E' };
Assertions.assertArrayEquals(new byte[0] () -> codec.decode(bytes));
codec.setStrictDecoding(true);
Assertions.assertThrows(IllegalArgumentException.class, () -> codec.decode());
{code}
Using strict encoding should ensure that a round trip returns the same bytes:
{code:java}
byte[] bytes = ...; // Some valid encoding with no padding characters
Base64 codec = new Base64();
codec.setStrictDecoding(true);
Assertions.assertArrayEquals(bytes, codec.encode(codec.decode(bytes)));
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)