[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Steven D'Aprano
On Mon, Nov 01, 2021 at 11:41:06AM -0700, Toshio Kuratomi wrote: > Unicode specifies the mapping of glyphs to code points. Then a second > mapping from code points to sequences of bytes is what is actually > recorded by the computer. The second mapping is what programmers > using Python will

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Terry Reedy
On 11/1/2021 8:17 AM, Petr Viktorin wrote: Nevertheless, I did do a bit of research about similar gotchas in Python, and I'd like to publish a summary as an informational PEP, pasted below. Very helpful. Bidirectional Text -- Some scripts, such as Hebrew or Arabic, are

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Jim J. Jewett
"The East Asian symbol for *ten* looks like a plus sign, so ``十= 10`` is a complete Python statement." Normally, an identifier must begin with a letter, and numbers can only be used in the second and subsequent positions. (XID_CONTINUE instead of XID_START) The fact that some characters with

[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Skip Montanaro
Sam> I think the performance difference is because of different versions of NumPy. Thanks all for the help/input/advice. It never occurred to me that two relatively recent versions of numpy would differ so much for the simple tasks in my script (array creation & transform). I confirmed this by

[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Christopher Barker
> I think the performance difference is because of different versions of > NumPy. > Good reason to leave numpy completely out of it. Unless you want to test nogil’s performance effects on numpy code — an interesting exercise in itself. Also — sorry I didn’t look at your code before, but you

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Toshio Kuratomi
This is an excellent enumeration of some of the concerns! One minor comment about the introductory material: On Mon, Nov 1, 2021 at 5:21 AM Petr Viktorin wrote: > > > > Introduction > > > > > > Python code is written in `Unicode`_ – a system for encoding and > > handling all kinds

[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Sam Gross
Hi Skip, I think the performance difference is because of different versions of NumPy. Python 3.9 installs NumPy 1.21.3 by default for "pip install numpy". I've only built and packaged NumPy 1.19.4 for "nogil" Python. There are substantial performance differences between the two NumPy builds for

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Serhiy Storchaka
This is excellent! 01.11.21 14:17, Petr Viktorin пише: >> CPython treats the control character NUL (``\0``) as end of input, >> but many editors simply skip it, possibly showing code that Python >> will not >> run as a regular part of a file. It is an implementation detail and we will get rid of

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Steven D'Aprano
Thanks for writing this Petr! A few comments below. On Mon, Nov 01, 2021 at 01:17:02PM +0100, Petr Viktorin wrote: > >ASCII-only Considerations > >- > > > >ASCII is a subset of Unicode > > > >While issues with the ASCII character set are generally well understood, >

[Python-Dev] pre-PEP: Unicode Security Considerations for Python

2021-11-01 Thread Petr Viktorin
Hello, Today, an attack called "Trojan source" was revealed, where a malicious contributor can use Unicode features (left-to-right text and homoglyphs) to code that, when shown in an editor, will look different from how a computer language parser will process it. See