Re: RFR 8025003: Base64 should be less strict with padding

Xueming Shen Wed, 13 Nov 2013 12:36:52 -0800

On 11/13/2013 11:37 AM, Bill Shannon wrote:

Xueming Shen wrote on 11/13/13 11:11:

On 11/13/2013 10:41 AM, Bill Shannon wrote:

The other thought is the charset API where a charset decoder can be configured
to ignore, replace or report then malformed or unmappable input. Having support
for all these actions is important for charset encoding/decoding but seems way
too much for Base64 where I think the API should be simple for the majority of
usages.

We started this with a request for a strict/lenient option.  That may still be
simpler than figuring out how to do strict decoding and report the error in a
way that users of the API can ignore the error and provide as much data as
possible.

In any case, it's not clear what we can do this late in the schedule. It might
be prudent to just fix the MIME decoder to throw IAE consistently and re-visit
the API support for a lenient decoder in JDK 9.

When we started this conversation there was plenty of time to fix this.  :-(

The issue here is we disagree on the specification of what lenient should be and
how the
API should look like.

Here is the proposed change to undo the "lenient padding handling for mime"
change we
did earlier to leave the option open for a complete "lenient base64" in future
release,
when we have a consensus

What other implementors of base64 MIME decoding software have you consulted,
or do you intend to consult in the future?


Yes, the plan is to see what other implementations do.

So far

(1) google's guava [1] just throws the exception

    com.google.common.io.BaseEncoding.base64().decode("QUJDA");

==> java.lang.IllegalArgumentException: 
com.google.common.io.BaseEncoding$DecodingException: Invalid input length 5

I don't think any of the configuration options provide can make this exception 
go away.

(2) apache's commons-codec [2] silently drops the dangling 6-bits
    new String(org.apache.commons.codec.binary.Base64.decodeBase64("QUJDA"))

==> ABC

Its source code [3] at ln#465 suggests it's "TODO"
    ...
    case 1: // 6 bits - ignore entirely
        // TODO not currently tested; perhaps it is impossible?
        break;
    ...

-Sherman

[1] 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/BaseEncoding.html
[2] 
http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
[3] 
http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/binary/Base64.java?revision=1447577&view=markup

What experiments have you done with other base64 MIME decoding software or
applications to determine how they handle these cases?

I'm trying to determine how we're going to reach consensus in the future.

My base64 MIME decoding software has evolved over time based on customer
requirements.  I'm trying to give you the benefit of that experience so
that you don't need to waste years getting to the same point I got to.
I started in a similar place as you, believing that applications would
want to know about improperly encoded data.  I learned that many do not,
and that most end-user applications simply want to be as lenient as possible
to provide the best data possible to the user.

Re: RFR 8025003: Base64 should be less strict with padding

Reply via email to