[
https://issues.apache.org/jira/browse/CODEC-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874811#action_12874811
]
Julius Davies commented on CODEC-91:
------------------------------------
I carefully analyzed the codec-1.3 logic. It doesn't quite handle this
scenario correctly. There's a bug. Result of running the test case originally
supplied by Chris Pimlott against codec-1.3 is the following:
{code}
byte[] result = b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());
// result now contains:
{'c', 'o', 'm', 'm', 'o', 'n', 's', NULL, 'c', 'o', 'm', 'm', 'o', 'n', 's'}
That NULL shouldn't be there (actually a zero since it's a primitive, but you
know what I mean), so the decode in the commons-codec-1.3 logic is actually
causing a corruption of the data in this scenario.
----------------------
So what should we do for 1.4.X? Here's what's on the table at the moment:
Option 0: Treat first PAD as EOF and stop all processing (current 1.4
behaviour).
Option 1: Restart decode state when PAD is encountered in any location.
Option 2: Only restart decode state if PAD's are correctly placed in the
stream (ie. AB==, ABC=). This is closest to the codec-1.3.jar behaviour, but
without the NULL bug.
Personally I'm +1 to option 1, for the following reasons:
- Probably easier to program. The implementation doesn't really need to know
much about the PAD when it encounters one (ie. no need to ask is this a good
PAD or a bad PAD?).
- Correctly decodes all Base64 inputs where PADs are in the correct place, so
in a way we're already implementing option 2 here.
- Also properly decodes the invalid input "AB=AB=" if stream corruption
happened because of: s/==/=/g
- For all other inputs it's garbage-in garbage-out
> Handling of embedded padding in base64 encoded data changed in 1.4
> ------------------------------------------------------------------
>
> Key: CODEC-91
> URL: https://issues.apache.org/jira/browse/CODEC-91
> Project: Commons Codec
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Chris Pimlott
> Attachments: codec-91-actually-works-and-tests-yay.patch
>
>
> 1.4 changed the way that embedded padding characters (i.e. "=") were handled
> when decoding base64 data. Previously, the decoder ignored them and decoded
> all the data. Now it stops upon encountering the first padding byte. This
> breaks compatibility with previous versions.
> For example, in 1.4,
> b64.decode("Y29tbW9ucwo=".getBytes());
> gives the same result as
> b64.decode("Y29tbW9ucwo=Y29tbW9ucwo=".getBytes());
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.