On Mon, Nov 15, 2021 at 12:33:54PM +0400, Abdur-Rahmaan Janhangeer wrote: > Yet another issue is adding vulnerabilities in plain sight. > Human code reviewers will see this: > > if user.admin == "something": > > Static analysers will see > > if user.admin == "something<hidden chars>":
Okay, you have a string literal with hidden characters. Assuming that your editor actually renders them as invisible characters, rather than "something???" or "something□□□" or "something���" or equivalent. Now what happens? where do you go from there to a vunerability or backdoor? I think it might be a bit obvious that there is something funny going on if I see: if (user.admin == "root" and check_password_securely() or user.admin == "root" # Second string has hidden characters, do not remove it. ): elevate_privileges() even without the comment :-) In another thread, Serhiy already suggested we ban invisible control characters (other than whitespace) in comments and strings. https://mail.python.org/archives/list/python-dev@python.org/message/DN24FK3A2DSO4HBGEDGJXERSAUYK6VK6/ I think that is a good idea. But beyond the C0 and C1 control characters, we should be conservative about banning "hidden characters" without a *concrete* threat. For example, variation selectors are "hidden", but they change the visual look of emoji and other characters. Even if you think that being able to set the skin tone of your emoji or choose different national flags using variation selectors is pure frippery, they are also necessary for Mongolian and some CJK ideographs. http://unicode.org/reports/tr28/tr28-3.html#13_7_variation_selectors I'm not sure about bidirectional controls; I have to leave that to people with more experience in bidirectional text than I do. I think that many editors in common use don't support bidirectional text, or at least the ones I use don't seem to support it fully or correctly. But for what little it is worth, my feeling is that people who use RTL or bidirectional strings and have editors that support them will be annoyed if we ban them from strings for the comfort of people who may never in their life come across a string containing such bidirectional text. But, if there is a concrete threat beyond "it looks weird", that it another issue. > but will not flag it as it's up to the user to verify the logic of > things There is no reason why linters and code checkers shouldn't check for invisible characters, Unicode confusables or mixed script identifiers and flag them. The interpreter shouldn't concern itself with such purely stylistic issues unless there is a concrete threat that can only be handled by the interpreter itself. -- Steve _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KSIBL3KMONIETBKXSBPPMA27MACWIH33/ Code of Conduct: http://python.org/psf/codeofconduct/