[ 
https://issues.apache.org/jira/browse/CODEC-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874811#action_12874811
 ] 

Julius Davies commented on CODEC-91:
------------------------------------

I carefully analyzed the codec-1.3 logic.  It doesn't quite handle this 
scenario correctly.  There's a bug.  Result of running the test case originally 
supplied by Chris Pimlott against codec-1.3 is the following:

{code}
byte[] result = b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

// result now contains:
{'c', 'o', 'm', 'm', 'o', 'n', 's', NULL, 'c', 'o', 'm', 'm', 'o', 'n', 's'}

That NULL shouldn't be there (actually a zero since it's a primitive, but you 
know what I mean), so the decode in the commons-codec-1.3 logic is actually 
causing a corruption of the data in this scenario.

----------------------
So what should we do for 1.4.X?  Here's what's on the table at the moment:

Option 0:  Treat first PAD as EOF and stop all processing (current 1.4 
behaviour).

Option 1:  Restart decode state when PAD is encountered in any location.

Option 2:  Only restart decode state if PAD's are correctly placed in the 
stream (ie. AB==, ABC=).  This is closest to the codec-1.3.jar behaviour, but 
without the NULL bug.


Personally I'm +1 to option 1, for the following reasons:

- Probably easier to program.  The implementation doesn't really need to know 
much about the PAD when it encounters one (ie. no need to ask is this a good 
PAD or a bad PAD?).

- Correctly decodes all Base64 inputs where PADs are in the correct place, so 
in a way we're already implementing option 2 here.

- Also properly decodes the invalid input "AB=AB=" if stream corruption 
happened because of: s/==/=/g 

- For all other inputs it's garbage-in garbage-out





> Handling of embedded padding in base64 encoded data changed in 1.4
> ------------------------------------------------------------------
>
>                 Key: CODEC-91
>                 URL: https://issues.apache.org/jira/browse/CODEC-91
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Chris Pimlott
>         Attachments: codec-91-actually-works-and-tests-yay.patch
>
>
> 1.4 changed the way that embedded padding characters (i.e. "=") were handled 
> when decoding base64 data.  Previously, the decoder ignored them and decoded 
> all the data.  Now it stops upon encountering the first padding byte.  This 
> breaks compatibility with previous versions.
> For example, in 1.4,
> b64.decode("Y29tbW9ucwo=".getBytes());
> gives the same result as
> b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to