Aleksandr Beliakov created CODEC-333:
----------------------------------------

             Summary: Separate Base64 decoding to "real" base64 and base64Url
                 Key: CODEC-333
                 URL: https://issues.apache.org/jira/browse/CODEC-333
             Project: Commons Codec
          Issue Type: Improvement
    Affects Versions: 1.18.0
            Reporter: Aleksandr Beliakov


The behavior applied in CODEC-75 does not distinguish between Base64 and 
Base64Url implementations within 
[isBase64|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isBase64(java.lang.String)]
 and 
[decodeBase64|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#decodeBase64(java.lang.String)]
 methods. Nor, it does change the behavior, even if instantiating a class 
object using a Builder like below, explicitly stating that url-safe processing 
should not be used:
{code:java}
Base64 base64 = Base64.builder().setUrlSafe(false).get(); {code}
with the method 
[isInAlphabet|https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html#isInAlphabet(byte)]
 still using the base64Url-safe decoding table.

 

This causes some issues in the implementation, for instance inability to 
distinguish between Base64 and PEM encoded certificates, with Apache Codec 
accepting the latest as Base64. See a (chunked and dummy) example below (with 
_Base64.isBase64()_ method returning true):
{code:java}
-----BEGIN CERTIFICATE-----
AQIDBAU=
-----END CERTIFICATE-----{code}
 

According to the [RFC 4648|https://datatracker.ietf.org/doc/html/rfc4648], 
handling of Base64 and Base64Url together should not be used:
{panel:title=5. Base 64 Encoding with URL and Filename Safe Alphabet}
This encoding may be referred to as "base64url".  *This encoding should not be 
regarded as the same as the "base64" encoding* and should not be referred to as 
only "base64".  Unless clarified otherwise, "base64" refers to the base 64 in 
the previous section.
{panel}
While same handling may be acceptable in some particular cases, we think a 
generic library like Apache Commons Codes shall make a clear separation between 
the two encodings and unless implementors explicitly state the support of 
url-safe characters, it should not be applied by default.

 

This should also avoid potential errors within the implementations, when a 
system shall treat base64 and base64url encodings as two separate ones (for 
instance like in 
[standards|https://www.etsi.org/deliver/etsi_ts/119100_119199/11918201/01.02.01_60/ts_11918201v010201p.pdf]
 explicitly requiring one or another implementation, based on the usage).

 

_FYI, Google Guava have two separate implementations 
[BaseEncoding.base64()|https://guava.dev/releases/17.0/api/docs/com/google/common/io/BaseEncoding.html#base64()]
 and 
[BaseEncoding.base64Url()|https://guava.dev/releases/17.0/api/docs/com/google/common/io/BaseEncoding.html#base64Url()]
 allowing implementors to explicitly choose one according to the project needs._



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to