On Mon, Nov 15, 2021 at 12:33:54PM +0400, Abdur-Rahmaan Janhangeer wrote:
> Yet another issue is adding vulnerabilities in plain sight.
> Human code reviewers will see this:
>
> if user.admin == "something":
>
> Static analysers will see
>
> if user.admin == "something<hidden chars>":
Okay, you have a string literal with hidden characters. Assuming that
your editor actually renders them as invisible characters, rather than
"something???" or "something□□□" or "something���" or equivalent.
Now what happens? where do you go from there to a vunerability or
backdoor? I think it might be a bit obvious that there is something
funny going on if I see:
if (user.admin == "root" and check_password_securely()
or user.admin == "root"
# Second string has hidden characters, do not remove it.
):
elevate_privileges()
even without the comment :-)
In another thread, Serhiy already suggested we ban invisible control
characters (other than whitespace) in comments and strings.
https://mail.python.org/archives/list/[email protected]/message/DN24FK3A2DSO4HBGEDGJXERSAUYK6VK6/
I think that is a good idea.
But beyond the C0 and C1 control characters, we should be conservative
about banning "hidden characters" without a *concrete* threat. For
example, variation selectors are "hidden", but they change the visual
look of emoji and other characters. Even if you think that being able to
set the skin tone of your emoji or choose different national flags using
variation selectors is pure frippery, they are also necessary for
Mongolian and some CJK ideographs.
http://unicode.org/reports/tr28/tr28-3.html#13_7_variation_selectors
I'm not sure about bidirectional controls; I have to leave that to
people with more experience in bidirectional text than I do. I think
that many editors in common use don't support bidirectional text, or at
least the ones I use don't seem to support it fully or correctly. But
for what little it is worth, my feeling is that people who use RTL or
bidirectional strings and have editors that support them will be annoyed
if we ban them from strings for the comfort of people who may never in
their life come across a string containing such bidirectional text.
But, if there is a concrete threat beyond "it looks weird", that it
another issue.
> but will not flag it as it's up to the user to verify the logic of
> things
There is no reason why linters and code checkers shouldn't check for
invisible characters, Unicode confusables or mixed script identifiers
and flag them. The interpreter shouldn't concern itself with such purely
stylistic issues unless there is a concrete threat that can only be
handled by the interpreter itself.
--
Steve
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/KSIBL3KMONIETBKXSBPPMA27MACWIH33/
Code of Conduct: http://python.org/psf/codeofconduct/