Re: [PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

ignace nyamagana butera Tue, 15 Jul 2025 06:22:48 -0700

On Mon, Jul 14, 2025 at 11:26 PM Andrey Andreev <n...@devilix.net> wrote:


> Hi all,
>
> I have a few suggestions, starting with naming improvements:
> - Forgiving instead of Lenient (align with
> https://infra.spec.whatwg.org/#forgiving-base64)
> - Shorten the option names; one example would be Variable/Constant instead
> of Unprotected/ConstantTime, but I think most could be rethinked
> - $input or $data instead of $decoded (could actually do the same instead
> of $encoded, but that one doesn't feel as wrong)
> - Not strictly about naming, but it similarly feels wrong that
> UnableToDecodeException extends EncodingException (which seems to have no
> purpose)
>
> However, I'm not a fan of how these simple functions have so many option
> flags ... it feels forced, trying to accomodate too much at once. I'd
> rather have discrete functions, like base64_*() and base64url_*() - I chose
> this example because base64 and base64url also have arguably different
> desirable defaults for padding; almost all pad-stripping I've seen in the
> wild has been for the purposes of converting to base64url.
> On a semi-related note, I'm not sure if including the IMAP variant isn't
> complicating things for no good reason (it is extra-niche, and we have
> imap_binary/base64() already).
>
> Also, the RFC doesn't specify whether DecodingMode::Strict would cause an
> error in case of missing padding?
>
> That being said, I'm very glad to see this!
>
> Cheers,
> Andrey.
>

Hi Andrey,

Forgiving instead of Lenient (align with
https://infra.spec.whatwg.org/#forgiving-base64)
>
I will adapt the text and use `Forgiving` instead

Shorten the option names; one example would be Variable/Constant
instead of Unprotected/ConstantTime, but I think most could be
rethinked


I will adapt the text and use `Variable/Constant` instead, thanks for
the suggestions,

$input or $data instead of $decoded (could actually do the same
instead of $encoded, but that one doesn't feel as wrong)
>
Usage of `$encoded` and `$decoded` as parameter names is done to
emphasize the *state of the data**,* rather than its format. This is
helpful as it avoids ambiguity ( `$data` is generic) and makes data
flow more explicit.

Not strictly about naming, but it similarly feels wrong that
UnableToDecodeException extends EncodingException (which seems to have
no purpose)


This follows the RFC guidelines regarding the introduction of new
exceptions to the language, particularly within extensions. Each
exception should reference its own exception marker (in this proposal,
`EncodingException`). Additionally, we introduce  a more specific
exception to handle errors that occur during the decoding of encoded
data.

On a semi-related note, I'm not sure if including the IMAP variant
isn't complicating things for no good reason (it is extra-niche, and
we have imap_binary/base64() already).


The `ext/imap` extension from which those functions are coming from
[has been unbundled from
PHP](https://wiki.php.net/rfc/unbundle_imap_pspell_oci8)

I chose this example because base64 and base64url also have arguably
different desirable defaults for padding; almost all pad-stripping
I've seen in the wild has been for the purposes of converting to
base64url.


Base64 and Base64url vary on their alphabet and on the presence or
absence of the padding string. With the proposed API it would mean
doing the following

```php
\Encoding::base64_encode('Hello world!'); //base64 standard encoding
\Encoding::base64_encode('Hello world!', variant:
\Encoding\Base64::UrlSafe); //base64 URL Safe encoding
```

Padding is by default controlled by the variant. Since UrlSafe does
not need padding no padding will be used. You should not even need to
specify the presence or
absence of padding. Unless you want to do something really specific
for your use case. In which case being explicit in what you want to
achieve is always a good design choice.

The default values for the options are chosen to cover the most common
use cases, so in many situations you won’t need to specify them at
all—making the API easier to use than it might initially appear.

Also, the RFC doesn't specify whether DecodingMode::Strict would cause
an error in case of missing padding?


Strict decoding behavior depends on the variant. For example, in the
case of Base64url, padding is considered optional. Therefore, under
`DecodingMode::Strict`, the absence of `=` padding characters will not
trigger an exception, as this behavior is compliant with the relevant
RFC.

In contrast, for `Base64::Standard`, omitting the padding character
*in strict mode *will result in an exception, since padding is
mandatory where applicable with such a variant. For clarity, I will
revise the RFC to explicitly state the behavior of each encoding
variant during strict mode decoding.

Best regards,

Ignace Nyamagana Butera

Re: [PHP-DEV] [RFC][DISCUSSION] Add RFC 4648 compliant data encoding API

Reply via email to