FILTER_SANITIZE_EMAIL should burn. If you have a bad email address, i can't
imagine the correct solution is to remove characters until it becomes
valid, short of a trim()

On Sun, Oct 2, 2022, 17:10 Larry Garfield <la...@garfieldtech.com> wrote:

> On Sat, Oct 1, 2022, at 10:39 AM, Kamil Tekiela wrote:
> > Hi Internals,
> >
> > For quite some time now, PHP's sanitize filters have "Rustled My
> Jimmies".
> > These filters bother me because I can't really justify their existence. I
> > can understand that a few of them are sensible and may come in handy,
> but I
> > would like to talk about some of these in particular.
> >
> > In PHP 8.1, we have deprecated FILTER_SANITIZE_STRING which I deemed to
> be
> > a priority due to its confusing name and behaviour. The rest is slightly
> > less dangerous, but as was pointed out to me in a recent conversation
> with
> > a PHP developer, these filters are all very confusing.
> >
> > I would like to have some opinions on the following filters. What do you
> > think we should do with them? Deprecate? Fix? Provide better
> documentation?
> >
> > ---
> >
> > *FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or
> encode
> > special characters."
> > Now, what does that mean? PHP has two functions for URL encoding:
> urlencode
> > used for encoding query-string parts, and rawurlencode used for encoding
> > any other URL part (two different RFCs are followed by these functions).
> > Which of these RFCs is applied in this filter? Furthermore, the
> description
> > says that "special characters" can be stripped or encoded. Is one of
> these
> > actions the default and the other can be selected by a flag or are both
> > optional? What are these special characters? Are they special in the
> > context of URL? If so, why did we encode them first? If these are HTML
> > special characters (there's no single definition of special HTML chars),
> > then why does this filter encode them if the filter is for URL
> > sanitization? What does backtick have to do with any of this
> > (FILTER_FLAG_STRIP_BACKTICK)?
> >
> > *FILTER_SANITIZE_ADD_SLASHES - "*Apply addslashes(). (Available as of PHP
> > 7.3.0)"
> > This filter was added as a replacement for magic_quotes filter. According
> > to PHP documentation, addslashes is supposed to be used when injecting
> PHP
> > variables into eval'd string. Real-life showed that this function is used
> > in a lot of places that have nothing to do with PHP's eval. I am not sure
> > if the sanitize filter is misused in a similar fashion, but judging from
> > the fact that it was meant as a replacement for magic_quotes, my guess is
> > that it's very likely still abused.
> >
> > *FILTER_SANITIZE_EMAIL *- "Remove all characters except letters, digits
> and
> > !#$%&'*+-=?^_`{|}~@.[]."
> > Which RFC does this adhere to? It strips slashes and quoted parts,
> doesn't
> > allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This
> > filter is ok for simple usage, but it isn't true to any known
> specification
> > AFAIK.
> >
> > *FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with
> > ASCII value less than 32, optionally strip or encode other special
> > characters."
> > What's the intended purpose of this filter? "Special characters" are
> still
> > not clearly defined, but at least it's more clear than
> > the FILTER_SANITIZE_ENCODED description. Same question about backticks
> > though: why? Why encode ASCII <32 chars?
> >
> > *FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling
> > htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled
> by
> > setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this
> filter
> > is aware of the default_charset and if a sequence of bytes is detected
> that
> > makes up an invalid character in the current character set then the
> entire
> > string is rejected resulting in a 0-length string. When using this filter
> > as a default filter, see the warning below about setting the default
> flags
> > to 0."
> > Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's
> not
> > used with filter_input(), it's the least problematic. We
> > have htmlspecialchars() though, so how useful is this filter?
> >
> > *FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just
> > called FILTER_RAW_STRING? If the value being filtered is something other
> > than a string, what will this filter return? Integers, floats, booleans
> and
> > nulls are converted to a string, Arrays and objects make the filter fail.
> >
> > ---
> >
> > Let's quickly mention the filter flags.
> >
> > The FILTER_FLAG_STRIP_LOW flag will also remove tabs, carriage returns
> and
> > newlines as these are all less than 32 ASCII codes. When is this useful
> and
> > expected?
> >
> > The FILTER_FLAG_ENCODE_LOW flag "encodes" ASCII <32 codes presumably into
> > HTML entities, although that's not specified anywhere in the PHP manual.
> > The word HTML does not appear on the
> > https://www.php.net/manual/en/filter.filters.flags.php page. What do
> these
> > characters look like when presented by HTML? When is it ever useful to
> use
> > this flag?
> >
> > FILTER_FLAG_ENCODE_AMP & FILTER_FLAG_STRIP_BACKTICK - why is this even a
> > thing?
> >
> > Due to flags, FILTER_VALIDATE_EMAIL will happily validate email addresses
> > that would be otherwise mangled by FILTER_SANITIZE_EMAIL.
> >
> > These are just the things I found confusing and strange about the
> sanitize
> > filters. Let's try to put ourselves in the shoes of an average PHP
> > developer trying to comprehend these filters. It's quite easy to shoot
> > yourself in the foot if you try to use them. The PHP manual doesn't do a
> > good job of explaining them, but that's probably because they are not
> easy
> > to explain. I can't come up with good examples of when they should be
> used.
> >
> > Regards,
> > Kamil
>
> The filter extension has always been a stillborn mess.  Its API is an
> absolute disaster and, as you note, its functionality is unclear at best,
> misleading at worst.  Frankly it's worse than SPL.
>
> I'd be entirely on board with jettisoning the entire thing, but baring
> that, ripping out large swaths of it that are misleading suits me fine.
>
> --Larry Garfield
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>
>

Reply via email to