[ 
https://issues.apache.org/jira/browse/CODEC-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981194#comment-16981194
 ] 

Alex Herbert commented on CODEC-263:
------------------------------------

When encoding bytes which are 8-bit into 6-bit characters you can encode 3 
bytes using 4 characters. Any trailing bytes leftover when dividing your byte 
length by 3 (i.e. 1 or 2 trailing bytes) are encoded as either 2 or 3 
characters and padded with the '=' character.

If you have 1 trailing byte this is encoded using 1 6-bit character and a 
second 6-bit character using the most significant 2 bits. The final 4 bits of 
the character should be zero.

If you have 2 trailing bytes this is encoded using 2 6-bit characters and a 
third 6-bit character using the most significant 4 bits. The final 2 bits of 
the character should be zero.

When decoding you collate the bits from 4 6-bit characters to make 3 8-bit 
bytes. If we remove all the characters that can be collected from your text 
file we have the following remaining at the end and the decoder throws an 
exception:
{noformat}
final String message = "/9o=";
Base64.decodeBase64(message);
{noformat}
The 6-bit hex digits are:
{noformat}
/ = 63 = 111111
9 = 61 = 111101
n = 39 = 100111
{noformat}
This is packed to make the following 2 bytes with 2 bits to discard:
{noformat}
11111111    11011001    11
{noformat}
Since 1.13 the decoder now checks trailing bits that are to be discarded are 
zero. In the case above the bits are not zero and so decoding produces an error 
since the encoding is illegal.

The same can be done for the final characters of "publishMessage":
{noformat}
final String message = "ge==";
Base64.decodeBase64(message);
{noformat}
The 6-bit hex digits are:
{noformat}
g = 32 = 100000
e = 30 = 011110
{noformat}
This is packed to make the following 1 byte with 4 bits to discard:
{noformat}
10000001    1110
{noformat}
Again in the case above the bits are not zero and so decoding produces an error 
since the encoding is illegal.

 

The question is should {{Base64.isBase64(String)}} do the same validation on 
the final character of the string?

> Base64.decodeBase64 throw exception
> -----------------------------------
>
>                 Key: CODEC-263
>                 URL: https://issues.apache.org/jira/browse/CODEC-263
>             Project: Commons Codec
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: JDK 7/JDK 8 
> commons-codec 1.13
>            Reporter: xie tao
>            Priority: Critical
>         Attachments: image-jpg-01-big.base64.txt
>
>
> Codec upgrade to 1.13, code  throw exception as follows:
> {code:java}
>   @Test
>   public  void test(){
>     Base64.decodeBase64("publishMessage");
>   }
> {code}
> exception like:
> {code:java}
> java.lang.IllegalArgumentException: Last encoded character (before the 
> paddings if any) is a valid base 64 alphabet but not a possible value
>       at 
> org.apache.commons.codec.binary.Base64.validateCharacter(Base64.java:798)
>       at org.apache.commons.codec.binary.Base64.decode(Base64.java:472)
>       at 
> org.apache.commons.codec.binary.BaseNCodec.decode(BaseNCodec.java:412)
>       at 
> org.apache.commons.codec.binary.BaseNCodec.decode(BaseNCodec.java:395)
>       at org.apache.commons.codec.binary.Base64.decodeBase64(Base64.java:694)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to