On Mon, Nov 15, 2021 at 12:33:54PM +0400, Abdur-Rahmaan Janhangeer wrote:

> Yet another issue is adding vulnerabilities in plain sight.
> Human code reviewers will see this:
> 
> if user.admin == "something":
> 
> Static analysers will see
> 
> if user.admin == "something<hidden chars>":

Okay, you have a string literal with hidden characters. Assuming that 
your editor actually renders them as invisible characters, rather than 
"something???" or "something□□□" or "something���" or equivalent.

Now what happens? where do you go from there to a vunerability or 
backdoor? I think it might be a bit obvious that there is something 
funny going on if I see:

    if (user.admin == "root" and check_password_securely() 
            or user.admin == "root"
            # Second string has hidden characters, do not remove it.
            ):
        elevate_privileges()

even without the comment :-)

In another thread, Serhiy already suggested we ban invisible control 
characters (other than whitespace) in comments and strings.

https://mail.python.org/archives/list/python-dev@python.org/message/DN24FK3A2DSO4HBGEDGJXERSAUYK6VK6/

I think that is a good idea.

But beyond the C0 and C1 control characters, we should be conservative 
about banning "hidden characters" without a *concrete* threat. For 
example, variation selectors are "hidden", but they change the visual 
look of emoji and other characters. Even if you think that being able to 
set the skin tone of your emoji or choose different national flags using 
variation selectors is pure frippery, they are also necessary for 
Mongolian and some CJK ideographs.

http://unicode.org/reports/tr28/tr28-3.html#13_7_variation_selectors

I'm not sure about bidirectional controls; I have to leave that to 
people with more experience in bidirectional text than I do. I think 
that many editors in common use don't support bidirectional text, or at 
least the ones I use don't seem to support it fully or correctly. But 
for what little it is worth, my feeling is that people who use RTL or 
bidirectional strings and have editors that support them will be annoyed 
if we ban them from strings for the comfort of people who may never in 
their life come across a string containing such bidirectional text.

But, if there is a concrete threat beyond "it looks weird", that it 
another issue.


> but will not flag it as it's up to the user to verify the logic of 
> things

There is no reason why linters and code checkers shouldn't check for 
invisible characters, Unicode confusables or mixed script identifiers 
and flag them. The interpreter shouldn't concern itself with such purely 
stylistic issues unless there is a concrete threat that can only be 
handled by the interpreter itself.


-- 
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KSIBL3KMONIETBKXSBPPMA27MACWIH33/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to