[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-08 Thread Stefan Behnel

Inada Naoki schrieb am 08.02.22 um 06:15:

On Tue, Feb 8, 2022 at 1:47 PM Guido van Rossum wrote:


Thanks for trying it! I'm curious why it would be slower (perhaps less 
locality? perhaps the ...Id... APIs have some other trick up their sleeve?) but 
since it's also messier and less backwards compatible than just leaving 
_Py_IDENTIFIER alone and just not using it, I'd say let's not spend more time 
on that alternative and just focus on the two other horses still in the race: 
immortal objects or what you have now.



I think it's because statically allocated strings are not interned.


That would explain such a difference.



I think deepfreeze should stop using statically allocated strings for
interned strings too.


… or consider the statically allocated strings the interned string value. 
Unless another one already exists, but that shouldn't be the case for 
CPython internal strings.


Stefan

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5NE7EI3TVW4C3ZZI6LO5HNPIZRQNPMHG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-08 Thread Sebastian Rittau

Am 03.02.22 um 22:41 schrieb Gregory P. Smith:
datapoint: an internal code search turns up blender, multidict, and 
typed_ast as open source users of _Py_IDENTIFIER .  Easy to clean up 
as PRs.  There are a couple of internal uses as well, all of which are 
similarly easy to address and are only in code that is expected to 
need API cleanup tweaks between CPython versions.


typed_ast is mostly legacy, to be used for parsing Python code < Python 
3.8. It's end of life is scheduled when Python 3.7 end of lifes in June 
2023. I believe we can easily get away with just copying the 
_Py_IDENTIFIER macros over to its source code in typed_ast.


 - Sebastian


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NYRNJRDSRPSABNSSQCU7NGJLUTZPENGM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-07 Thread Inada Naoki
On Tue, Feb 8, 2022 at 1:47 PM Guido van Rossum  wrote:
>
> Thanks for trying it! I'm curious why it would be slower (perhaps less 
> locality? perhaps the ...Id... APIs have some other trick up their sleeve?) 
> but since it's also messier and less backwards compatible than just leaving 
> _Py_IDENTIFIER alone and just not using it, I'd say let's not spend more time 
> on that alternative and just focus on the two other horses still in the race: 
> immortal objects or what you have now.
>

I think it's because statically allocated strings are not interned.
I think deepfreeze should stop using statically allocated strings for
interned strings too.

--
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7OILLUEZSQY6OTY5WY543JTUGRVSFNPS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-07 Thread Guido van Rossum
Thanks for trying it! I'm curious why it would be slower (perhaps less
locality? perhaps the ...Id... APIs have some other trick up their sleeve?)
but since it's also messier and less backwards compatible than just leaving
_Py_IDENTIFIER alone and just not using it, I'd say let's not spend more
time on that alternative and just focus on the two other horses still in
the race: immortal objects or what you have now.

On Mon, Feb 7, 2022 at 5:41 PM Eric Snow 
wrote:

> On Fri, Feb 4, 2022 at 8:25 PM Eric Snow 
> wrote:
> > On Fri, Feb 4, 2022, 16:03 Guido van Rossum  wrote:
> >> I wonder if a better solution than that PR wouldn't be to somehow
> change the implementation of _Py_IDENTIFIER() to do that,
> >
> > Yeah, I had the same realization today.  I'm going to try it out.
>
> I updated _Py_IDENTIFIER() to use a statically initialized string
> object and it isn't too bad.  The tricky thing is that PyASCIIObject
> expects to the data to be an array after the object.  So the field
> must be a pre-sized array (like I did in gh-30928).  That makes things
> messier.  The alternative is to do what Steve is suggesting.
>
> I ran the benchmarks and making _Py_IDENTIFIER() a statically
> initialized object makes things 2% slower (instead of 1% faster).
> There are a few things I could do to speed that up a little, but at
> best we'd get back to performance-neutral.
>
> -eric
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/M47DIA75SZJO3ZCTJPLRTOUBYFIIZH4N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-07 Thread Eric Snow
On Fri, Feb 4, 2022 at 8:25 PM Eric Snow  wrote:
> On Fri, Feb 4, 2022, 16:03 Guido van Rossum  wrote:
>> I wonder if a better solution than that PR wouldn't be to somehow change the 
>> implementation of _Py_IDENTIFIER() to do that,
>
> Yeah, I had the same realization today.  I'm going to try it out.

I updated _Py_IDENTIFIER() to use a statically initialized string
object and it isn't too bad.  The tricky thing is that PyASCIIObject
expects to the data to be an array after the object.  So the field
must be a pre-sized array (like I did in gh-30928).  That makes things
messier.  The alternative is to do what Steve is suggesting.

I ran the benchmarks and making _Py_IDENTIFIER() a statically
initialized object makes things 2% slower (instead of 1% faster).
There are a few things I could do to speed that up a little, but at
best we'd get back to performance-neutral.

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DDWOJLFOTXTZ35LMBCPH2DHFMCSVLHH5/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-07 Thread Guido van Rossum
Yes, please build it; I can't quite follow what you're getting at (probably
since I don't know the Windows APIs).

On Mon, Feb 7, 2022 at 4:39 AM Steve Dower  wrote:

> On 2/5/2022 4:09 PM, Guido van Rossum wrote:
> > On Sat, Feb 5, 2022 at 5:18 AM Steve Dower  > > wrote:
> >
> > The gap this has with what I'd like to see is that it will only work
> > for
> > compile-time strings. If instead it could work for an arbitrary
> uint8_t
> > pointer, rather than an embedded array, that allows even runtime
> > strings
> > to be very cheaply passed into the interpreter (and yes, the caller
> has
> > to manage the lifetime, but that's pretty easy).
> >
> >
> > What's the use case for that, that isn't covered by
> PyUnicode_FromString()?
>
> No dynamic memory allocation (and hence, no deallocation) for any case
> where the original string is already managed, which is really quite
> common. Access to the memory manager is what requires thread
> affinity/the GIL for primitive object construction, and string copies -
> especially with transcoding - are the expensive part.
>
> For cases where strings are just passed around but never manipulated
> (e.g. a lot of filesystem operations, or runtime/interpreter
> configuration), the string may never have to be decoded at all. It's
> almost as good as a tagged pointer, but without breaking our existing
> object model (assuming all the PyUnicode_* functions learn how to handle
> them, which is necessary).
>
> But it's purely a transparent optimisation for most users, with an added
> opportunity for those using native APIs and probably decent complexity
> for us as maintainers. There are a lot of edge cases to handle that I'm
> sure people will use to shoot down the idea, which is why rather than
> debate details here I'd rather build it and define its boundaries of
> usefulness, though for now there's plenty of stuff I'd rather do than
> both of these, so it remains an idea for now :)
>
> Cheers,
> Steve
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/I5KB6CADRNVBF6ZUAC7BPKEEY4ORQ4Z4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-07 Thread Steve Dower

On 2/5/2022 4:09 PM, Guido van Rossum wrote:
On Sat, Feb 5, 2022 at 5:18 AM Steve Dower > wrote:


The gap this has with what I'd like to see is that it will only work
for
compile-time strings. If instead it could work for an arbitrary uint8_t
pointer, rather than an embedded array, that allows even runtime
strings
to be very cheaply passed into the interpreter (and yes, the caller has
to manage the lifetime, but that's pretty easy).


What's the use case for that, that isn't covered by PyUnicode_FromString()?


No dynamic memory allocation (and hence, no deallocation) for any case 
where the original string is already managed, which is really quite 
common. Access to the memory manager is what requires thread 
affinity/the GIL for primitive object construction, and string copies - 
especially with transcoding - are the expensive part.


For cases where strings are just passed around but never manipulated 
(e.g. a lot of filesystem operations, or runtime/interpreter 
configuration), the string may never have to be decoded at all. It's 
almost as good as a tagged pointer, but without breaking our existing 
object model (assuming all the PyUnicode_* functions learn how to handle 
them, which is necessary).


But it's purely a transparent optimisation for most users, with an added 
opportunity for those using native APIs and probably decent complexity 
for us as maintainers. There are a lot of edge cases to handle that I'm 
sure people will use to shoot down the idea, which is why rather than 
debate details here I'd rather build it and define its boundaries of 
usefulness, though for now there's plenty of stuff I'd rather do than 
both of these, so it remains an idea for now :)


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6SBJZ7X6XNSU76K53IC4HPNWNJDT3IX2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-05 Thread Guido van Rossum
On Sat, Feb 5, 2022 at 5:18 AM Steve Dower  wrote:

> The gap this has with what I'd like to see is that it will only work for
> compile-time strings. If instead it could work for an arbitrary uint8_t
> pointer, rather than an embedded array, that allows even runtime strings
> to be very cheaply passed into the interpreter (and yes, the caller has
> to manage the lifetime, but that's pretty easy).
>

What's the use case for that, that isn't covered by PyUnicode_FromString()?

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QGSWDGQEL7TFIC6YM7H7BM5PQJKTWM56/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-05 Thread Steve Dower

On 04Feb2022 2303, Guido van Rossum wrote:
You *can* allocate unicode objects statically. We do it in deepfreeze, 
and Eric's PR under discussion here 
(https://github.com/python/cpython/pull/30928 
) does it. I wonder if a 
better solution than that PR wouldn't be to somehow change the 
implementation of _Py_IDENTIFIER() to do that, and make the special 'Id' 
APIs just aliases for the corresponding unicode-object-based APIs? It 
wouldn't be ABI compatible, but none of those APIs are in the stable ABI.


Indeed. I'm sure you can appreciate how I skimmed over that bit :) (plus 
I've mostly kept up with this by chatting with Eric, rather than 
reviewing all the code)


The guts of the code in question for those who don't want to find it:
#define STRUCT_FOR_ASCII_STR(LITERAL) \
struct { \
PyASCIIObject _ascii; \
uint8_t _data[sizeof(LITERAL)]; \
}

Which is then used inside another struct to statically allocate all the 
objects.


The gap this has with what I'd like to see is that it will only work for 
compile-time strings. If instead it could work for an arbitrary uint8_t 
pointer, rather than an embedded array, that allows even runtime strings 
to be very cheaply passed into the interpreter (and yes, the caller has 
to manage the lifetime, but that's pretty easy).


So this is perfect for the needs we have for our internal API, but I 
wouldn't want it to become _the_ public API. (I also wouldn't want to 
have two public APIs that are subtly different, because in a few years 
time everyone will forget why and be very confused about it all.) Not 
sure I'm motivated enough to build it myself, but I'm getting there ;) 
so perhaps I'll put something together that does what I'd like and give 
us concrete things to discuss.


(Is there a requirement that an Id only contain ASCII characters (i.e., 
7-bit)?)


I don't think it's unreasonable to require that for these internal 
identifier strings, but it would be a shame to do so for arbitrary strings.


Cheers,
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LAKPW7A4NP3O6FCWTX3PBOWWZWNM75F3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Eric Snow
On Fri, Feb 4, 2022, 16:03 Guido van Rossum  wrote:

> I wonder if a better solution than that PR wouldn't be to somehow change
> the implementation of _Py_IDENTIFIER() to do that,
>

Yeah, I had the same realization today.  I'm going to try it out.

-eric

>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/A7Q4TBBOCEAXZYOY6GSY3NA2FSVNUMHL/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Guido van Rossum
You *can* allocate unicode objects statically. We do it in deepfreeze, and
Eric's PR under discussion here (
https://github.com/python/cpython/pull/30928) does it. I wonder if a better
solution than that PR wouldn't be to somehow change the implementation of
_Py_IDENTIFIER() to do that, and make the special 'Id' APIs just aliases
for the corresponding unicode-object-based APIs? It wouldn't be ABI
compatible, but none of those APIs are in the stable ABI.

(Is there a requirement that an Id only contain ASCII characters (i.e.,
7-bit)?)

On Fri, Feb 4, 2022 at 12:52 PM Steve Dower  wrote:

> On 2/4/2022 5:37 PM, Eric Snow wrote:
> > On Thu, Feb 3, 2022 at 3:49 PM Eric Snow 
> wrote:
> >> I suppose I'd like to know what the value of _Py_IDENTIFIER() is for
> >> 3rd party modules.
> >
> > Between Guido, Victor, Stefan, and Sebastian, I'm getting the sense
> > that a public replacement for _Py_IDENTIFER() would be worth pursuing.
> > Considering that it would probably help numpy move toward
> > subinterpreter support, I may work on this after all. :)
> >
> > (For core CPython we'll still benefit from the statically initialized
> > strings, AKA gh-30928.)
>
> For me, I'd love to see a statically allocated string type (for a real
> example that's used in Windows, check out [1], specifically when he gets
> to the fast-pass strings).
>
> Essentially, a bare minimum struct around a char* (and/or wchar_t*) that
> acts as a PyUnicodeObject but doesn't ever allocate anything on the
> heap. This would also be helpful for config strings, which are often
> static but need to be copied around a lot, and good for passthrough
> strings that a native extension or host app might insert and receive
> back unmodified.
>
> Because there's nothing to deallocate, it can be "created" and stored
> however the caller likes. As soon as Python code does anything with it
> other than passing it around, a regular PyUnicodeObject is allocated
> (just like the HSTRING example).
>
> I'd expect usage to look very much like the intern examples earlier in
> the thread, but if we actually return a whole struct then we aren't on
> the hook for the allocations.
>
> Cheers,
> Steve
>
> [1]: https://devblogs.microsoft.com/oldnewthing/20160615-00/?p=93675
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DGX4GBMDJYYFAE7OSVMBGKYAO2HPP3PT/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7LGVQRWLQQJNQD5M5BEUAPAFBCQTUWK7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Steve Dower

On 2/4/2022 5:37 PM, Eric Snow wrote:

On Thu, Feb 3, 2022 at 3:49 PM Eric Snow  wrote:

I suppose I'd like to know what the value of _Py_IDENTIFIER() is for
3rd party modules.


Between Guido, Victor, Stefan, and Sebastian, I'm getting the sense
that a public replacement for _Py_IDENTIFER() would be worth pursuing.
Considering that it would probably help numpy move toward
subinterpreter support, I may work on this after all. :)

(For core CPython we'll still benefit from the statically initialized
strings, AKA gh-30928.)


For me, I'd love to see a statically allocated string type (for a real 
example that's used in Windows, check out [1], specifically when he gets 
to the fast-pass strings).


Essentially, a bare minimum struct around a char* (and/or wchar_t*) that 
acts as a PyUnicodeObject but doesn't ever allocate anything on the 
heap. This would also be helpful for config strings, which are often 
static but need to be copied around a lot, and good for passthrough 
strings that a native extension or host app might insert and receive 
back unmodified.


Because there's nothing to deallocate, it can be "created" and stored 
however the caller likes. As soon as Python code does anything with it 
other than passing it around, a regular PyUnicodeObject is allocated 
(just like the HSTRING example).


I'd expect usage to look very much like the intern examples earlier in 
the thread, but if we actually return a whole struct then we aren't on 
the hook for the allocations.


Cheers,
Steve

[1]: https://devblogs.microsoft.com/oldnewthing/20160615-00/?p=93675
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DGX4GBMDJYYFAE7OSVMBGKYAO2HPP3PT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Stefan Behnel

Eric Snow schrieb am 04.02.22 um 17:35:

On Fri, Feb 4, 2022 at 8:21 AM Stefan Behnel wrote:

Correct. We (intentionally) have our own way to intern strings and do not
depend on CPython's identifier framework.


You're talking about __Pyx_StringTabEntry (and __Pyx_InitString())?


Yes, that's what we generate. The C code parsing is done here:

https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978be3/Cython/Compiler/Code.py#L531-L550

The deduplication is a bit complex on our side because it needs to handle 
Python source encodings, and also distinguishes between identifiers (that 
become 'str' in Py2), plain Unicode strings and byte strings. You don't 
need most of that for plain C code. But it's done here:


https://github.com/cython/cython/blob/79637b23da77732e753b1e1ab5669b3e29978be3/Cython/Compiler/Code.py#L1009-L1088

And then there's a whole bunch of code that helps in getting Unicode 
character code points and arbitrary byte values in very long strings pushed 
through C compilers, while keeping it mostly readable for interested users. :)


https://github.com/cython/cython/blob/master/Cython/Compiler/StringEncoding.py

You probably don't need that either, as long as you only deal with ASCII 
strings.


Any way, have fun. Feel free to ask if I can help.

Stefan

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QHJBAKIQUKFPIM6GZ7DYNJF3HDMDQQUH/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Eric Snow
On Thu, Feb 3, 2022 at 3:49 PM Eric Snow  wrote:
> I suppose I'd like to know what the value of _Py_IDENTIFIER() is for
> 3rd party modules.

Between Guido, Victor, Stefan, and Sebastian, I'm getting the sense
that a public replacement for _Py_IDENTIFER() would be worth pursuing.
Considering that it would probably help numpy move toward
subinterpreter support, I may work on this after all. :)

(For core CPython we'll still benefit from the statically initialized
strings, AKA gh-30928.)

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D2LHEZZUQH66Q5ZIOEJTGSCEMQEMKCUQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Eric Snow
On Fri, Feb 4, 2022 at 8:21 AM Stefan Behnel  wrote:
> Correct. We (intentionally) have our own way to intern strings and do not
> depend on CPython's identifier framework.

You're talking about __Pyx_StringTabEntry (and __Pyx_InitString())?

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/Q5AL3SLW5BCUA6FLDBUNZTH5Z7ZYAHER/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Sebastian Berg
On Thu, 2022-02-03 at 15:32 +0100, Victor Stinner wrote:
> By the way, Argument Clinic now produces way faster calling
> conventions than hand-written METH_VARARGS with PyArg_ParseTuple().
> It
> would be make this tool available to 3rd party projects.
> 
> Either extract it and put it on PyPI, but it means that Python will
> need to Python and pip install something to build itself... not good?
> 
> Or we can add it to stdlib. IMO we need to write more tests and more
> documentation. Right now, test_clinic is "light". Another issue is
> that it requires on the currently privte _PyArg_Parser structure. By
> the way, this structure is causing crashes if sub-interpreters are
> run
> in parallel (one GIL per interpreter) because of the
> _PyArg_Parser.kwtuple tuple of keyword parameter names (tuple of
> str).
> 
> If Eric's tool becomes successful inside CPython, we may consider to
> make it available to 3rd party projects as well. Such tool and
> Argument Clinic and not so different than Cython which generates C
> code to ease the development of C extensions modules.


It would be great to have a clear API available to store constants like
this, for NumPy these currently are:
* mainly interned strings
* Some custom singletons which are currently defined in Python
  (including exceptions, but not so important)
* Python functions that are called from C.

I am pretty sure that some of the places that do not have module state
available, but likely many (most?) of them could be refactored (e.g. a
converter function that needs a singleton: it isn't terrible to
manually converter later on).  Many will be in methods.


It would be great to have a "blessed" solution for kwarg parsing.  I
had shunned argument clinic because it felt too internal, probably
CPython specific, and it seemed like you basically need to check in the
generated code for each supported Python version...

The current solution in NumPy [1] is a bit limited and uses those
statics to store the interned strings. And it duplicates Python logic
that would be nice to not duplicate.

Cheers,

Sebastian



[1] For the curious, the API for NumPy looks like this:

/* 
 * Declare a function static struct for string creation/interning
 * and caching other prep (generated on the first call.
 * `npy_parse_arugments` is a macro, but only to insert the cache name)
 */
NPY_PREPARE_ARGPARSER;

if (npy_parse_arguments(funcname, args, len_args, kwnames
"", NULL, _only_obj,
"kw1", , _val,
"|kw2", NULL, _obj,
"$kw_only1", NULL, _only2_obj,
NULL, NULL, NULL) < 0) {
return NULL;
}

So it is very limited to converters (e.g. no "i" or "!O" convenience),
since we don't use that much anyway.
One annoyance is that "i" and "s" don't have a clear "converter" and
that converters can't raise an error message that includes the function
and parameter name.


> 
> Victor
> 
> On Thu, Feb 3, 2022 at 7:50 AM Inada Naoki 
> wrote:
> > 
> > +1 for overall.
> > 
> > On Thu, Feb 3, 2022 at 7:45 AM Eric Snow <
> > ericsnowcurren...@gmail.com> wrote:
> > > 
> > > 
> > > I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> > > other related API including ~14 (private) C-API functions. 
> > > Dropping
> > > all that helps reduce maintenance costs.  However, at least one
> > > PyPI
> > > project (blender) is using _Py_IDENTIFIER().  So, before we could
> > > get
> > > rid of it, we'd first have to deal with that project (and any
> > > others).
> > > 
> > 
> > It would be nice to provide something similar to _PY_IDENTIFIER,
> > but
> > designed (and documented) for 3rd party modules like this.
> > 
> > ```
> > typedef struct {
> >     Py_IDENTIFIER(foo);
> > ...
> > } Modstate;
> > ...
> >     // in some func
> >     Modstate *state = (Modstate*)PyModule_GetState(module);
> >     PyObject_GetAttr(o, PY_IDENTIFIER_STR(state->foo));
> > ...
> > // m_free()
> > static void mod_free(PyObject *module) {
> >     Modstate *state = (Modstate*)PyModule_GetState(module);
> >     Py_IDENTIFIER_CLEAR(state->foo);
> > }
> > ```
> > 
> > 
> > --
> > Inada Naoki  
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at 
> > https://mail.python.org/archives/list/python-dev@python.org/message/ZZ5QOZDOAO734SDRJGMXW6AJGAVEPUHE/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> 


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PBBH2RXKZO2HYWYLNZU5VJ6B7BJSABKZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Stefan Behnel

Ronald Oussoren via Python-Dev schrieb am 03.02.22 um 14:46:

On 2 Feb 2022, at 23:41, Eric Snow wrote:
* a little less convenient: adding a global string requires modifying
a separate file from the one where you actually want to use the string
* strings can get "orphaned" (I'm planning on checking in CI)
* some strings may never get used for any given ./python invocation
(not that big a difference though)


The first two cons can probably be fixed by adding some indirection, with some
markers at the place of use and a script that uses those to generate the
C definitions.

Although my gut feeling is that adding a the CI check you mention is good
enough and adding the tooling for generating code isn’t worth the additional
complexity.


It's what we do in Cython, and it works really well there. It's very 
straight forward, you just write something like


PYUNICODE("some text here")
PYIDENT("somename")

in your C code and Cython creates a deduplicated global string table from 
them and replaces the string constants with the corresponding global 
variables. (We have two different names because an identifier in Py2 is 
'str', not 'unicode'.)


Now, the thing with CPython is that the C sources where the replacement 
would take place are VCS controlled. And a script that replaces the 
identifiers would have to somehow make sure that the new references do not 
get renamed, which would lead to non-local changes when strings are added.


What you could try is to number the identifiers, i.e. use a macro like

_Py_STR(123, "some text here")

where you manually add a new identifier as

_Py_STR("some text here")

and the number is filled in automatically by a script that finds all of 
them, deduplicates, and adds new identifiers at the end, adding 1 to the 
maximum number that it finds. That makes sure that identifiers that already 
have an ID number will not be touched, deleted strings disappear 
automatically, and non-local changes are prevented.


Defining the _Py_STR() macro as

   #define _Py_STR(id, text)  (_Py_global_string_table[id])

or

   #define _Py_STR(id, text)  (_Py_global_string_table##id)

would also give you a compile error if someone forgets to run the script.

Stefan

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/LD3JM2NQ5ZUZDK63RH4IVZPCZ7HC4X3G/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Stefan Behnel

Victor Stinner schrieb am 03.02.22 um 22:46:

Oh right, Cython seems to be a false positive.

A code search found 3 references to __Pyx_PyObject_LookupSpecial():

PYPI-2022-01-26-TOP-5000/Cython-0.29.26.tar.gz:
Cython-0.29.26/Cython/Compiler/ExprNodes.py: lookup_func_name =
'__Pyx_PyObject_LookupSpecial'
PYPI-2022-01-26-TOP-5000/Cython-0.29.26.tar.gz:
Cython-0.29.26/Cython/Compiler/Nodes.py: code.putln("%s =
__Pyx_PyObject_LookupSpecial(%s, %s); %s" % (
PYPI-2022-01-26-TOP-5000/Cython-0.29.26.tar.gz:
Cython-0.29.26/Cython/Utility/ObjectHandling.c: static CYTHON_INLINE
PyObject* __Pyx_PyObject_LookupSpecial(PyObject* obj, PyObject*
attr_name) {

Oh, that's not "_PyObject_LookupSpecial()", it doesn't use the
_Py_Identifier type:

static CYTHON_INLINE PyObject*
__Pyx_PyObject_LookupSpecial(PyObject* obj, PyObject* attr_name)
{ ... }


Correct. We (intentionally) have our own way to intern strings and do not 
depend on CPython's identifier framework.


Stefan

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4ATP4FSVRNI5CLAJDN43QRDH5IHW7BW2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-04 Thread Victor Stinner
On Thu, Feb 3, 2022 at 11:49 PM Eric Snow  wrote:
> I suppose I'd like to know what the value of _Py_IDENTIFIER() is for
> 3rd party modules.

_Py_IDENTIFIER() API is simple and short, Python creates the Python
string object automatically, *and* clears this object at exit. It's
convenient.

On the other side, functions like PyDict_GetItemString() call
PyUnicode_FromString(): it allocates memory on the heap, it decodes
the bytes string from UTF-8, and then deallocates the memory. It's
inefficient just for a dict lookup.

In the _Py_IDENTIFIER() API, _PyUnicode_FromId() calls
PyUnicode_InternInPlace() which makes dict lookup even more efficient.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZYONQJPBC7IEIJVUTUBHNZXVOV7YOBUF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Eric Snow
On Thu, Feb 3, 2022 at 4:01 PM Guido van Rossum  wrote:
> Why not read through some of that code and see what they are doing with it?

Yep, I'm planning on it.

> I imagine one advantage is that _Py_IDENTIFIER() can be used entirely local 
> to a function.

Yeah, they'd have to put something like this in their module init:

state->partial_str = PyUnicode_InternFromString("partial");
if (state->partial_str == NULL) {
return NULL;
}

> E.g. (from _operator.c):
>
> _Py_IDENTIFIER(partial);
> functools = PyImport_ImportModule("functools");
> if (!functools)
> return NULL;
> partial = _PyObject_GetAttrId(functools, _partial);
>
> That's convenient since it means they don't have to pass module state around.

I might call that cheating. :)  For an extension module this means
they are storing a little bit of their state in the
runtime/interpreter state instead of in their module state.  Is there
precedent for that with any of our other API?

Regardless, the status quo certainly is simpler (if they aren't
already using module state in the function).  Without _Py_IDENTIFER()
it would look like:

functools = PyImport_ImportModule("functools");
if (!functools)
return NULL;
my_struct *state = (my_struct*)PyModule_GetState(module);
if (state == NULL) {
Py_DECREF(functools);
return NULL;
}
partial = PyObject_GetAttr(functools, state->partial_str);

If they are already using the module state in their function then the
code would be simpler:

functools = PyImport_ImportModule("functools");
if (!functools)
return NULL;
partial = PyObject_GetAttr(functools, state->partial_str);

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4625QOLXLZAAU2XNXEQM5W2JWX3FH4VM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Eric Snow
On Thu, Feb 3, 2022 at 7:26 AM Victor Stinner  wrote:
> In bpo-39465, I made the _PyUnicode_FromId() compatible with running
> sub-interpreters in parallel (one GIL per interpreter).
>
> A "static" PyUnicodeObject would have to share the reference count
> between sub-interpreters, whereas Py_INCREF/Py_DECREF are not
> thread-safe: there is lock to prevent data races.

Yeah, if we end up not being able to safely share the global string
objects between interpreters then we would move them under
PyInterpreterState.  Currently I'm putting them under _PyRuntimeState.
Doing that might reduce the performance benefits a little, since
Py_GET_GLOBAL_STRING() would have to look up the interpreter to use
(or we'd have to pass it in).  That doesn't seem like much of a
penalty though and doesn't impact the other benefits of the change.

> Is there a way to push the "immortal objects" strategy discussed in
> bpo-40255?

I'm planning on circling back to that next week.

> The deepfreeze already pushed some functions related to
> that, like _PyObject_IMMORTAL_INIT() in the internal C API.
> Moreover... deepfreeze already produces "immortal" PyUnicodeObject
> strings using the "ob_refcnt = 9" hack.

Note we only set the value really high as a safety precaution since
these objects are all statically allocated.  Eddie Elizondo's proposal
involves a number of other key points, including keeping the refcount
from changing.

> IMO we should decide on a strategy. Either we move towards immortal
> objects (modify Py_INCREF/Py_DECREF to not modify the ref count if an
> object is immortal), or we make sure that no Python is shared between
> two Python interpreters.

+1

The catch is that things get messier when we make some objects
per-interpreter while others stay runtime-global.  I'm going to write
a bit more about this next week, but the best strategy will probably
be to first consolidate all the global objects under _PyRuntimeState
first and then move them to PyInterpreterState all at once when we can
do it safely.

> > I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> > other related API including ~14 (private) C-API functions.  Dropping
> > all that helps reduce maintenance costs.
>
> Is it required by your work on static strings, or is it more about
> removing the API which would no longer be consumed by Python itself?

It is definitely not required for that.  Rather, we won't need it any
more so we should benefit from getting rid of it.  The only blocker is
that some 3rd party modules are using it.

> If it's not required, would it make sense to follow the PEP 387
> deprecation (mark functions as deprecated, document the deprecation,
> and wait 2 releases to remove it)?

If you think it's worth it.  It's a private API.  I'd rather work to
get 3rd party modules off it and then move on sooner.

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WR33NVCSIHOMN5X7YGCL2DHNCBQGKWAU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Guido van Rossum
On Thu, Feb 3, 2022 at 2:50 PM Eric Snow 
wrote:

> On Wed, Feb 2, 2022 at 11:50 PM Inada Naoki 
> wrote:
> > It would be nice to provide something similar to _PY_IDENTIFIER, but
> > designed (and documented) for 3rd party modules like this.
>
> I suppose I'd like to know what the value of _Py_IDENTIFIER() is for
> 3rd party modules.  They can already use PyUnicode_InternFromString()
> to get a "global" object and then store it in their module state.  I
> would not expect _Py_IDENTIFIER() to provide much of an advantage over
> that.  Perhaps I'm missing something?
>
> If there is a real benefit then we should definitely figure out a good
> public API for it (if the current private one isn't sufficient).  I
> won't be authoring that PEP though. :)
>

Why not read through some of that code and see what they are doing with it?

I imagine one advantage is that _Py_IDENTIFIER() can be used entirely local
to a function. E.g. (from _operator.c):

_Py_IDENTIFIER(partial);
functools = PyImport_ImportModule("functools");
if (!functools)
return NULL;
partial = _PyObject_GetAttrId(functools, _partial);

That's convenient since it means they don't have to pass module state
around.

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/G5Y7DROKELUQYSVMHQBUIUMZOPODXCXF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Eric Snow
On Thu, Feb 3, 2022 at 7:17 AM Victor Stinner  wrote:
> In the top 5000 PyPI projects, I found 11 projects using them:
> [snip[
> They use the these 17 functions:

Thanks!  That is super helpful.

> If the _Py_IDENTIFIER() API is removed, it would be *nice* to provide
> a migrate path (tool?) to help these projects moving away the
> _Py_IDENTIFIER() API. Or at least do the work to update these 11
> projects.

If something like _Py_IDENTIFIER() provides genuine value then we
should consider a proper public API.  Otherwise I agree that we should
work with those projects to stop using it.  I guess either way they
should stop using the "private" API. :)

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IGXAXUSDBKYOOVVFSAUYLE5R5TXVZT4A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Eric Snow
On Thu, Feb 3, 2022 at 6:46 AM Ronald Oussoren  wrote:
> Although my gut feeling is that adding a the CI check you mention is good
> enough and adding the tooling for generating code isn’t worth the additional
> complexity.

Yeah, I came to the same conclusion. :)

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FAMDFSTN5KR2Z7LOVTK5GGF6YKR6G65Z/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Eric Snow
On Wed, Feb 2, 2022 at 11:50 PM Inada Naoki  wrote:
> It would be nice to provide something similar to _PY_IDENTIFIER, but
> designed (and documented) for 3rd party modules like this.

I suppose I'd like to know what the value of _Py_IDENTIFIER() is for
3rd party modules.  They can already use PyUnicode_InternFromString()
to get a "global" object and then store it in their module state.  I
would not expect _Py_IDENTIFIER() to provide much of an advantage over
that.  Perhaps I'm missing something?

If there is a real benefit then we should definitely figure out a good
public API for it (if the current private one isn't sufficient).  I
won't be authoring that PEP though. :)

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AVJKKWITJPHUQTE2IXDYBCTQTKVPZPD7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Victor Stinner
Oh right, Cython seems to be a false positive.

A code search found 3 references to __Pyx_PyObject_LookupSpecial():

PYPI-2022-01-26-TOP-5000/Cython-0.29.26.tar.gz:
Cython-0.29.26/Cython/Compiler/ExprNodes.py: lookup_func_name =
'__Pyx_PyObject_LookupSpecial'
PYPI-2022-01-26-TOP-5000/Cython-0.29.26.tar.gz:
Cython-0.29.26/Cython/Compiler/Nodes.py: code.putln("%s =
__Pyx_PyObject_LookupSpecial(%s, %s); %s" % (
PYPI-2022-01-26-TOP-5000/Cython-0.29.26.tar.gz:
Cython-0.29.26/Cython/Utility/ObjectHandling.c: static CYTHON_INLINE
PyObject* __Pyx_PyObject_LookupSpecial(PyObject* obj, PyObject*
attr_name) {

Oh, that's not "_PyObject_LookupSpecial()", it doesn't use the
_Py_Identifier type:

static CYTHON_INLINE PyObject*
__Pyx_PyObject_LookupSpecial(PyObject* obj, PyObject* attr_name)
{ ... }

Victor

On Thu, Feb 3, 2022 at 7:27 PM  wrote:
>
> Victor Stinner wrote:
> > On Wed, Feb 2, 2022 at 11:49 PM Eric Snow ericsnowcurren...@gmail.com wrote:
>
> > In the top 5000 PyPI projects, I found 11 projects using them:
> > * Cython-0.29.26 (and so indirect most projects using Cython)
>
> I believe Cython is (for once) a false alarm here. I don't think it uses any 
> of those functions.
>
> It has a comment that contains "_PyObject_LookupAttrId" - 
> https://github.com/cython/cython/blob/8d2df028bf9536942b60670bf0aa80d6acc7464a/Cython/Utility/ObjectHandling.c#L1061
>  - This is a bit of code that we'd adapted from CPython and the comment just 
> explains what the original line is.
>
> It's possible I've missed something of course, in which case let me know.
>
> David
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/5WICJJ5GYJE54JKPG56UY33YHHFQROHX/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/DXXEYI7YPZCQO6VAPLVNVVAPH4LZ4KVU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Gregory P. Smith
On Wed, Feb 2, 2022 at 2:48 PM Eric Snow 
wrote:

> I'm planning on moving us to a simpler, more efficient alternative to
> _Py_IDENTIFIER(), but want to see if there are any objections first
> before moving ahead.  Also see https://bugs.python.org/issue46541.
>
> _Py_IDENTIFIER() was added in 2011 to replace several internal string
> object caches and to support cleaning up the cached objects during
> finalization.  A number of "private" functions (each with a
> _Py_Identifier param) were added at that time, mostly corresponding to
> existing functions that take PyObject* or char*.  Note that at present
> there are several hundred uses of _Py_IDENTIFIER(), including a number
> of duplicates.
>
> My plan is to replace our use of _Py_IDENTIFIER() with statically
> initialized string objects (as fields under _PyRuntimeState).  That
> involves the following:
>
> * add a PyUnicodeObject field (not a pointer) to _PyRuntimeState for
> each string that currently uses _Py_IDENTIFIER() (or
> _Py_static_string())
> * statically initialize each object as part of the initializer for
> _PyRuntimeState
> * add a macro to look up a given global string
> * update each location that currently uses _Py_IDENTIFIER() to use the
> new macro instead
>
> Pros:
>
> * reduces indirection (and extra calls) for C-API functions that need
> the strings (making the code a little easier to understand and
> speeding it up)
> * the objects are referenced from a fixed address in the static data
> section instead of the heap (speeding things up and allowing the C
> compiler to optimize better)
> * there is no lazy allocation (or lookup, etc.) so there are fewer
> possible failures when the objects get used (thus less error return
> checking)
> * saves memory (at little, at least)
> * if needed, the approach for per-interpreter is simpler
> * helps us get rid of several hundred static variables throughout the code
> base
> * allows us to get rid of _Py_IDENTIFIER() and a bunch of related
> C-API functions
> * "deep frozen" modules can use the global strings
> * commonly-used strings could be pre-allocated by adding
> _PyRuntimeState fields for them
>
> Cons:
>
> * a little less convenient: adding a global string requires modifying
> a separate file from the one where you actually want to use the string
> * strings can get "orphaned" (I'm planning on checking in CI)
> * some strings may never get used for any given ./python invocation
> (not that big a difference though)
>
> I have a PR up (https://github.com/python/cpython/pull/30928) that
> adds the global strings and replaces use of _Py_IDENTIFIER() in our
> code base, except for in non-builtin stdlib extension modules.  (Those
> will be handled separately if we proceed.)  The PR also adds a CI
> check for "orphaned" strings.  It leaves _Py_IDENTIFIER() for now, but
> disallows any Py_BUILD_CORE code from using it.
>
> With that change I'm seeing a 1% improvement in performance (see
> https://github.com/faster-cpython/ideas/issues/230).
>
> I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> other related API including ~14 (private) C-API functions.  Dropping
> all that helps reduce maintenance costs.  However, at least one PyPI
> project (blender) is using _Py_IDENTIFIER().  So, before we could get
> rid of it, we'd first have to deal with that project (and any others).
>

datapoint: an internal code search turns up blender, multidict, and
typed_ast as open source users of _Py_IDENTIFIER .  Easy to clean up as
PRs.  There are a couple of internal uses as well, all of which are
similarly easy to address and are only in code that is expected to need API
cleanup tweaks between CPython versions.

Overall I think addressing the broader strategy question among
the performance focused folks is worthwhile though.

-gps


>
> To sum up, I wanted to see if there are any objections before I start
> merging anything.  Thanks!
>
> -eric
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DNMZAMB4M6RVR76RDZMUK2WRLI6KAAYS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BCQJ6AZMPTI2DGFQPC27RUIFJQGDIOQD/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Brett Cannon
On Thu, Feb 3, 2022 at 6:31 AM Victor Stinner  wrote:

> On Wed, Feb 2, 2022 at 11:43 PM Eric Snow 
> wrote:
> > My plan is to replace our use of _Py_IDENTIFIER() with statically
> > initialized string objects (as fields under _PyRuntimeState).  That
> > involves the following:
> >
> > * add a PyUnicodeObject field (not a pointer) to _PyRuntimeState for
> > each string that currently uses _Py_IDENTIFIER() (or
> > _Py_static_string())
>
> In bpo-39465, I made the _PyUnicode_FromId() compatible with running
> sub-interpreters in parallel (one GIL per interpreter).
>
> A "static" PyUnicodeObject would have to share the reference count
> between sub-interpreters, whereas Py_INCREF/Py_DECREF are not
> thread-safe: there is lock to prevent data races.
>
> Is there a way to push the "immortal objects" strategy discussed in
> bpo-40255? The deepfreeze already pushed some functions related to
> that, like _PyObject_IMMORTAL_INIT() in the internal C API.
> Moreover... deepfreeze already produces "immortal" PyUnicodeObject
> strings using the "ob_refcnt = 9" hack.
>
> IMO we should decide on a strategy. Either we move towards immortal
> objects (modify Py_INCREF/Py_DECREF to not modify the ref count if an
> object is immortal), or we make sure that no Python is shared between
> two Python interpreters.
>

Either we come to an agreement as a group (for against the concept and then
how to do it), or if there's a stalemate it goes to the SC (with
potentially a PEP to properly explain the nuances of the proposed solution).

-Brett


>
>
> > I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> > other related API including ~14 (private) C-API functions.  Dropping
> > all that helps reduce maintenance costs.
>
> Is it required by your work on static strings, or is it more about
> removing the API which would no longer be consumed by Python itself?
>
> If it's not required, would it make sense to follow the PEP 387
> deprecation (mark functions as deprecated, document the deprecation,
> and wait 2 releases to remove it)?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/FFQ6N7K2BHG37JVFOACIEZLTKH43NV3L/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VFI6XGSCY2PVGHEHHDJY5QOT7MKM4Q5P/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Brett Cannon
On Thu, Feb 3, 2022 at 6:37 AM Victor Stinner  wrote:

> By the way, Argument Clinic now produces way faster calling
> conventions than hand-written METH_VARARGS with PyArg_ParseTuple(). It
> would be make this tool available to 3rd party projects.
>
> Either extract it and put it on PyPI, but it means that Python will
> need to Python and pip install something to build itself... not good?
>

Since we check in the generated source you only need Python installed while
developing CPython, not to build it. And since that's already a requirement
to build the docs, freeze code, etc. I don't think that's an unreasonable
requirement.

We also don't have to break Argument Clinic out of our repo just to make it
available externally. There's nothing saying we can't have a subdirectory
of code that we directly rely on and publish separately for our development
needs. Could even add blurb into the repo that way if we wanted to.

-Brett


>
> Or we can add it to stdlib. IMO we need to write more tests and more
> documentation. Right now, test_clinic is "light". Another issue is
> that it requires on the currently privte _PyArg_Parser structure. By
> the way, this structure is causing crashes if sub-interpreters are run
> in parallel (one GIL per interpreter) because of the
> _PyArg_Parser.kwtuple tuple of keyword parameter names (tuple of str).
>
> If Eric's tool becomes successful inside CPython, we may consider to
> make it available to 3rd party projects as well. Such tool and
> Argument Clinic and not so different than Cython which generates C
> code to ease the development of C extensions modules.
>
> Victor
>
> On Thu, Feb 3, 2022 at 7:50 AM Inada Naoki  wrote:
> >
> > +1 for overall.
> >
> > On Thu, Feb 3, 2022 at 7:45 AM Eric Snow 
> wrote:
> > >
> > >
> > > I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> > > other related API including ~14 (private) C-API functions.  Dropping
> > > all that helps reduce maintenance costs.  However, at least one PyPI
> > > project (blender) is using _Py_IDENTIFIER().  So, before we could get
> > > rid of it, we'd first have to deal with that project (and any others).
> > >
> >
> > It would be nice to provide something similar to _PY_IDENTIFIER, but
> > designed (and documented) for 3rd party modules like this.
> >
> > ```
> > typedef struct {
> > Py_IDENTIFIER(foo);
> > ...
> > } Modstate;
> > ...
> > // in some func
> > Modstate *state = (Modstate*)PyModule_GetState(module);
> > PyObject_GetAttr(o, PY_IDENTIFIER_STR(state->foo));
> > ...
> > // m_free()
> > static void mod_free(PyObject *module) {
> > Modstate *state = (Modstate*)PyModule_GetState(module);
> > Py_IDENTIFIER_CLEAR(state->foo);
> > }
> > ```
> >
> >
> > --
> > Inada Naoki  
> > ___
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-le...@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZZ5QOZDOAO734SDRJGMXW6AJGAVEPUHE/
> > Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QNP7RDER74XU3DNHJP5EAZY6G6N4VYE7/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/NUPJY6G3OYEMYEAUA7BJX665A6PHDHW6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread dw-git
Victor Stinner wrote:
> On Wed, Feb 2, 2022 at 11:49 PM Eric Snow ericsnowcurren...@gmail.com wrote:

> In the top 5000 PyPI projects, I found 11 projects using them:
> * Cython-0.29.26 (and so indirect most projects using Cython)

I believe Cython is (for once) a false alarm here. I don't think it uses any of 
those functions.

It has a comment that contains "_PyObject_LookupAttrId" - 
https://github.com/cython/cython/blob/8d2df028bf9536942b60670bf0aa80d6acc7464a/Cython/Utility/ObjectHandling.c#L1061
 - This is a bit of code that we'd adapted from CPython and the comment just 
explains what the original line is.

It's possible I've missed something of course, in which case let me know.

David
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5WICJJ5GYJE54JKPG56UY33YHHFQROHX/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Victor Stinner
By the way, Argument Clinic now produces way faster calling
conventions than hand-written METH_VARARGS with PyArg_ParseTuple(). It
would be make this tool available to 3rd party projects.

Either extract it and put it on PyPI, but it means that Python will
need to Python and pip install something to build itself... not good?

Or we can add it to stdlib. IMO we need to write more tests and more
documentation. Right now, test_clinic is "light". Another issue is
that it requires on the currently privte _PyArg_Parser structure. By
the way, this structure is causing crashes if sub-interpreters are run
in parallel (one GIL per interpreter) because of the
_PyArg_Parser.kwtuple tuple of keyword parameter names (tuple of str).

If Eric's tool becomes successful inside CPython, we may consider to
make it available to 3rd party projects as well. Such tool and
Argument Clinic and not so different than Cython which generates C
code to ease the development of C extensions modules.

Victor

On Thu, Feb 3, 2022 at 7:50 AM Inada Naoki  wrote:
>
> +1 for overall.
>
> On Thu, Feb 3, 2022 at 7:45 AM Eric Snow  wrote:
> >
> >
> > I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> > other related API including ~14 (private) C-API functions.  Dropping
> > all that helps reduce maintenance costs.  However, at least one PyPI
> > project (blender) is using _Py_IDENTIFIER().  So, before we could get
> > rid of it, we'd first have to deal with that project (and any others).
> >
>
> It would be nice to provide something similar to _PY_IDENTIFIER, but
> designed (and documented) for 3rd party modules like this.
>
> ```
> typedef struct {
> Py_IDENTIFIER(foo);
> ...
> } Modstate;
> ...
> // in some func
> Modstate *state = (Modstate*)PyModule_GetState(module);
> PyObject_GetAttr(o, PY_IDENTIFIER_STR(state->foo));
> ...
> // m_free()
> static void mod_free(PyObject *module) {
> Modstate *state = (Modstate*)PyModule_GetState(module);
> Py_IDENTIFIER_CLEAR(state->foo);
> }
> ```
>
>
> --
> Inada Naoki  
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/ZZ5QOZDOAO734SDRJGMXW6AJGAVEPUHE/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QNP7RDER74XU3DNHJP5EAZY6G6N4VYE7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Victor Stinner
On Wed, Feb 2, 2022 at 11:43 PM Eric Snow  wrote:
> My plan is to replace our use of _Py_IDENTIFIER() with statically
> initialized string objects (as fields under _PyRuntimeState).  That
> involves the following:
>
> * add a PyUnicodeObject field (not a pointer) to _PyRuntimeState for
> each string that currently uses _Py_IDENTIFIER() (or
> _Py_static_string())

In bpo-39465, I made the _PyUnicode_FromId() compatible with running
sub-interpreters in parallel (one GIL per interpreter).

A "static" PyUnicodeObject would have to share the reference count
between sub-interpreters, whereas Py_INCREF/Py_DECREF are not
thread-safe: there is lock to prevent data races.

Is there a way to push the "immortal objects" strategy discussed in
bpo-40255? The deepfreeze already pushed some functions related to
that, like _PyObject_IMMORTAL_INIT() in the internal C API.
Moreover... deepfreeze already produces "immortal" PyUnicodeObject
strings using the "ob_refcnt = 9" hack.

IMO we should decide on a strategy. Either we move towards immortal
objects (modify Py_INCREF/Py_DECREF to not modify the ref count if an
object is immortal), or we make sure that no Python is shared between
two Python interpreters.


> I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> other related API including ~14 (private) C-API functions.  Dropping
> all that helps reduce maintenance costs.

Is it required by your work on static strings, or is it more about
removing the API which would no longer be consumed by Python itself?

If it's not required, would it make sense to follow the PEP 387
deprecation (mark functions as deprecated, document the deprecation,
and wait 2 releases to remove it)?

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/FFQ6N7K2BHG37JVFOACIEZLTKH43NV3L/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Victor Stinner
On Wed, Feb 2, 2022 at 11:49 PM Eric Snow  wrote:
> FTR, here is the (private/internal) C-API affected by getting rid of
> _Py_IDENTIFIER():
>
> * 21 C-API functions with `_Py_Identifer` parameters - would be dropped
>+ _PyUnicode_FromId()
>+ _PyUnicode_EqualToASCIIId()
>+ _PyObject_CallMethodId()
>+ _PyObject_CallMethodId_SizeT()
>+ _PyObject_CallMethodIdObjArgs()
>+ _PyObject_VectorcallMethodId()
>+ _PyObject_CallMethodIdNoArgs()
>+ _PyObject_CallMethodIdOneArg()
>+ _PyEval_GetBuiltinId()
>+ _PyDict_GetItemId()
>+ _PyDict_SetItemId()
>+ _PyDict_DelItemId()
>+ _PyDict_ContainsId()
>+ _PyImport_GetModuleId()
>+ _PyType_LookupId()
>+ _PyObject_LookupSpecial()
>+ _PyObject_GetAttrId()
>+ _PyObject_SetAttrId()
>+ _PyObject_LookupAttrId()
>+ _PySys_GetObjectId()
>+ _PySys_SetObjectId()

I am well aware that we don't provide any backward compatibility
warranty on the private and internal C API. *But*. People love to use
they anyway.

In the top 5000 PyPI projects, I found 11 projects using them:

* Cython-0.29.26 (and so indirect most projects using Cython)
* datatable-1.0.0
* frozendict-2.2.0
* multidict-6.0.2
* mypy-0.931
* pickle5-0.0.12
* pysqlite3-0.4.6
* ruamel.ordereddict-0.4.15
* scandir-1.10.0
* typed_ast-1.5.2
* zodbpickle-2.2.0

They use the these 17 functions:

* _PyDict_ContainsId()
* _PyDict_DelItemId()
* _PyDict_GetItemId()
* _PyDict_GetItemIdWithError()
* _PyDict_SetItemId()
* _PyEval_GetBuiltinId()
* _PyObject_CallMethodId()
* _PyObject_CallMethodIdNoArgs()
* _PyObject_CallMethodIdObjArgs()
* _PyObject_CallMethodIdOneArg()
* _PyObject_GetAttrId()
* _PyObject_LookupAttrId()
* _PyObject_LookupSpecial()
* _PyObject_SetAttrId()
* _PySys_GetObjectId()
* _PyUnicode_EqualToASCIIId()
* _PyUnicode_FromId()

If the _Py_IDENTIFIER() API is removed, it would be *nice* to provide
a migrate path (tool?) to help these projects moving away the
_Py_IDENTIFIER() API. Or at least do the work to update these 11
projects.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7RMLIJHUWVBZFV747TFEHOE6LNBVQSMM/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-03 Thread Ronald Oussoren via Python-Dev


> On 2 Feb 2022, at 23:41, Eric Snow  wrote:
> 
> I
[…]
> 
> Cons:
> 
> * a little less convenient: adding a global string requires modifying
> a separate file from the one where you actually want to use the string
> * strings can get "orphaned" (I'm planning on checking in CI)
> * some strings may never get used for any given ./python invocation
> (not that big a difference though)

The first two cons can probably be fixed by adding some indirection, with some 
markers at the place of use and a script that uses those to generate the
C definitions. 

Although my gut feeling is that adding a the CI check you mention is good
enough and adding the tooling for generating code isn’t worth the additional
complexity.

Ronald
—

Twitter / micro.blog: @ronaldoussoren
Blog: https://blog.ronaldoussoren.net/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6EJSL7LBAZM4HL5THZDZGTYFS5HRAIPY/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-02 Thread Inada Naoki
+1 for overall.

On Thu, Feb 3, 2022 at 7:45 AM Eric Snow  wrote:
>
>
> I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> other related API including ~14 (private) C-API functions.  Dropping
> all that helps reduce maintenance costs.  However, at least one PyPI
> project (blender) is using _Py_IDENTIFIER().  So, before we could get
> rid of it, we'd first have to deal with that project (and any others).
>

It would be nice to provide something similar to _PY_IDENTIFIER, but
designed (and documented) for 3rd party modules like this.

```
typedef struct {
Py_IDENTIFIER(foo);
...
} Modstate;
...
// in some func
Modstate *state = (Modstate*)PyModule_GetState(module);
PyObject_GetAttr(o, PY_IDENTIFIER_STR(state->foo));
...
// m_free()
static void mod_free(PyObject *module) {
Modstate *state = (Modstate*)PyModule_GetState(module);
Py_IDENTIFIER_CLEAR(state->foo);
}
```


-- 
Inada Naoki  
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZZ5QOZDOAO734SDRJGMXW6AJGAVEPUHE/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

2022-02-02 Thread Eric Snow
On Wed, Feb 2, 2022 at 3:41 PM Eric Snow  wrote:
> I'd also like to actually get rid of _Py_IDENTIFIER(), along with
> other related API including ~14 (private) C-API functions.

FTR, here is the (private/internal) C-API affected by getting rid of
_Py_IDENTIFIER():

* 21 C-API functions with `_Py_Identifer` parameters - would be dropped
   + _PyUnicode_FromId()
   + _PyUnicode_EqualToASCIIId()
   + _PyObject_CallMethodId()
   + _PyObject_CallMethodId_SizeT()
   + _PyObject_CallMethodIdObjArgs()
   + _PyObject_VectorcallMethodId()
   + _PyObject_CallMethodIdNoArgs()
   + _PyObject_CallMethodIdOneArg()
   + _PyEval_GetBuiltinId()
   + _PyDict_GetItemId()
   + _PyDict_SetItemId()
   + _PyDict_DelItemId()
   + _PyDict_ContainsId()
   + _PyImport_GetModuleId()
   + _PyType_LookupId()
   + _PyObject_LookupSpecial()
   + _PyObject_GetAttrId()
   + _PyObject_SetAttrId()
   + _PyObject_LookupAttrId()
   + _PySys_GetObjectId()
   + _PySys_SetObjectId()
* 7 new internal functions to replace the _Py*Id() functions that
didn't already have a normal counterpart
   + _PyObject_CallMethodObj()
   + _PyObject_IsSingleton()
   + _PyEval_GetBuiltin()
   + _PySys_SetAttr()
   + _PyObject_LookupSpecial() (with PyObject* param)
   + _PyDict_GetItemWithError()
   + _PyObject_CallMethod()
* the runtime state related to identifiers - would be dropped
* _Py_Identifier, _Py_IDENTIFIER(), _Py_static_string() - would be dropped

-eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/OXKABHIUDUQETWXXBKUWD63XN65IVC22/
Code of Conduct: http://python.org/psf/codeofconduct/