Re: [PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

Andrey Andreev Tue, 15 Jul 2025 14:02:01 -0700

Hello Ignace,

$input or $data instead of $decoded (could actually do the same
instead of $encoded, but that one doesn't feel as wrong)
>>
> Usage of `$encoded` and `$decoded` as parameter names is done to emphasize 
> the *state of the data**,* rather than its format. This is helpful as it 
> avoids ambiguity ( `$data` is generic) and makes data flow more explicit.
>
>
Yes, I know where you're coming from, but I don't see the ambiguity when
calling a *_decode() function, while the name $decoded is not semantically
correct. Admittedly, this is a bit of bikeshedding, but ...
For something to be "decoded", it has to have been encoded first. There's
no reason to think that this would be the case, and arguably more often
than not it won't be.
Similarly, there's no guarantee that the parameter isn't already encoded in
some other format, or even the same format (i.e. would be performing double
encoding).



> Not strictly about naming, but it similarly feels wrong that 
> UnableToDecodeException extends EncodingException (which seems to have no 
> purpose)
>
>
> This follows the RFC guidelines regarding the introduction of new exceptions 
> to the language, particularly within extensions. Each exception should 
> reference its own exception marker (in this proposal, `EncodingException`). 
> Additionally, we introduce  a more specific exception to handle errors that 
> occur during the decoding of encoded data.
>
>
Sorry, I've been out of the loop for quite awhile and may've missed
something. Can you point me to the guideline in question?


> On a semi-related note, I'm not sure if including the IMAP variant isn't 
> complicating things for no good reason (it is extra-niche, and we have 
> imap_binary/base64() already).
>
>
> The `ext/imap` extension from which those functions are coming from [has been 
> unbundled from PHP](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8)
>
>
Fair enough. I do still believe it is too niche though.

I chose this example because base64 and base64url also have arguably
different desirable defaults for padding; almost all pad-stripping
I've seen in the wild has been for the purposes of converting to
base64url.
>
>
> Base64 and Base64url vary on their alphabet and on the presence or absence of 
> the padding string. With the proposed API it would mean doing the following
>
> ```php
> \Encoding::base64_encode('Hello world!'); //base64 standard encoding
> \Encoding::base64_encode('Hello world!', variant: \Encoding\Base64::UrlSafe); 
> //base64 URL Safe encoding
> ```
>
> Padding is by default controlled by the variant. Since UrlSafe does not need 
> padding no padding will be used. You should not even need to specify the 
> presence or
> absence of padding. Unless you want to do something really specific for your 
> use case. In which case being explicit in what you want to achieve is always 
> a good design choice.
>
> The default values for the options are chosen to cover the most common use 
> cases, so in many situations you won’t need to specify them at all—making the 
> API easier to use than it might initially appear.
>
>
Is it though? Sure it is easy for the single most common use case, but it
creates other subtle problems and violates the Principle Of Least
Astonishment:
- To use base64url, one needs to write a line of code twice as long (just
the enum name itself is longer than the function name)
- The API encourages that the Variant parameter be the default judge of
padding behavior, despite the function having a Padding behavior parameter.
- Variant-dependent behavior is harder to both document and explain to users
- RFC 4648 section 5 actually makes a big deal out of the base64 vs
base64url naming, they are not the same thing, yet the proposed API tries
to put them under a single "base64" function umbrella

API design is hard. :)

Also, the RFC doesn't specify whether DecodingMode::Strict would cause
an error in case of missing padding?
>
>
> Strict decoding behavior depends on the variant. For example, in the case of 
> Base64url, padding is considered optional. Therefore, under 
> `DecodingMode::Strict`, the absence of `=` padding characters will not 
> trigger an exception, as this behavior is compliant with the relevant RFC.
>
> In contrast, for `Base64::Standard`, omitting the padding character *in 
> strict mode *will result in an exception, since padding is mandatory where 
> applicable with such a variant. For clarity, I will revise the RFC to 
> explicitly state the behavior of each encoding variant during strict mode 
> decoding.
>
> Yes, please! Padding in the default base64 variant often has security
implications, that's why I asked.

Cheers,
Andrey.

Re: [PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

Reply via email to