lenient decoding

Alex Herbert (Jira) Tue, 21 Jan 2020 05:26:36 -0800

Alex Herbert created CODEC-280:
----------------------------------

             Summary: Base32/64 to allow optional strict/lenient decoding
                 Key: CODEC-280
                 URL: https://issues.apache.org/jira/browse/CODEC-280
             Project: Commons Codec
          Issue Type: Improvement
    Affects Versions: 1.14
            Reporter: Alex Herbert
            Assignee: Alex Herbert



Base32 decodes blocks of 8 characters.

Base64 decodes blocks of 4 characters.

At the end of decoding some extra characters may be left. They are decoded 
using the appropriate bits. The bits that do not sum to form a byte (i.e. less 
than 8 bits) are discarded.

Currently if there are more than 8 bits left then the available bytes are 
extracted and the left over bits are validated to check they are zeros. If they 
are not zeros then an exception is raised. This functionality was added to 
ensure that a byte array that is decoded will be re-encoded to the exact same 
byte array (ignoring input padding).

There are two issues:
 # If the leftover bits are less than 8 then no attempt can be made to obtain 
the last bytes. However an exception is not raised indicating that the encoding 
was invalid (no left-over bits should be unaccounted for).
 # This raising of exceptions for leftover bits is causing reports from users 
that codec is not working as it used to. This is true but only because the user 
has some badly encoded bytes they want to decode. Since other libraries allow 
this then it seems that two options for decoding are required.

I suggest fixing the encoding so that it operates in two modes: strict and 
lenient.
 * Strict will throw an exception whenever there are unaccounted for bits.
 * Lenient will just discard the extra bits that cannot be used.

Lenient is the default for backward compatibility restoring functionality of 
the class to versions prior to 1.13.

 Strict is enabled using a method:
{code:java}
Base64 codec = new Base64();
byte[] bytes = new byte{ 'E' };
Assertions.assertArrayEquals(new byte[0] () -> codec.decode(bytes));
codec.setStrictDecoding(true);
Assertions.assertThrows(IllegalArgumentException.class, () -> codec.decode());
{code}
Using strict encoding should ensure that a round trip returns the same bytes:
{code:java}
byte[] bytes = ...; // Some valid encoding with no padding characters
Base64 codec = new Base64();
codec.setStrictDecoding(true);
Assertions.assertArrayEquals(bytes, codec.encode(codec.decode(bytes)));
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CODEC-280) Base32/64 to allow optional strict/lenient decoding

Reply via email to