Re: [PHP-DEV] [RFC] Decoding HTML and the Ambiguous Ampersand

Jakob Givoni Sat, 24 Aug 2024 12:57:32 -0700

Hi Dennis,

Overall it sounds like a reasonable RFC.


> Dennis:
>
> > Niels:
> >
> > I'm not so sure that the name "decode_html" is self-descriptive enough,
it sounds very generic.
>
> The name is not very important to me. For the sake of history, the reason
I have chosen “decode HTML” is because, unlike an HTML parser, this is
focused on taking a snippet of HTML “text” content and decoding it into a
“plain PHP string.”

Why not make it two methods called "decode_html_text" and
"decode_html_attribute"?
Consider the following reasons:
1. The function doesn't actually decode html as such, it decodes either an
html text node string or an html attribute string.
2. Saves the $context parameter and the constants/enums, making the call
significantly shorter.
3. It feels like decoding either text or attribute are two significantly
different things. I admit I could be wrong, if code like
decode_html($e->isAttritbute() ? HtmlContext::Attribute :
HtmlContext::Text, $e->getContent()) is likely to be seen. But I somehow
don't foresee a lot of situations where text and attribute strings end up
in the same code path?

A couple of other options that would silence anyone opposed to implicitly
favouring utf-8:
html_text_to_utf8 and html_attribute_to_utf8

Best,
Jakob

Re: [PHP-DEV] [RFC] Decoding HTML and the Ambiguous Ampersand

Reply via email to