On Mon, 2024-01-22 at 17:08 -0700, Nathan wrote:
> Hi all,
> 
> I propose we accept NEP 55 and merge PR #25347 implementing the NEP
> in time
> for the NumPy 2.0 RC:


I really like this work and I think it is a big improvement!  At this
point we probably have to expect some things to be still buggy, but
that is also a reason to get it in (testing is hard if it isn't shipped
first-class unfortunately).

Nathan summarized the things I might have brought up very well.  The 
support of missing values is the one thing that to me may end up a bit
more in flux.
But I am happy to hope that this is in a way that pandas will not be
affected and, honestly, without deep integration testing we won't make
progress in figuring out whether there is some change needed or not.

Thanks for the great work!

- Sebastian


> 
> https://numpy.org/neps/nep-0055-string_dtype.html
> https://github.com/numpy/numpy/pull/25347
> 
> The most controversial aspect of the NEP was support for missing
> strings
> via a user-supplied sentinel object. In the previous discussion on
> the
> mailing list, Warren Weckesser argued for shipping a missing data
> sentinel
> with NumPy for use with the DType, while in code review and the PR
> for the
> NEP, Sebestian expressed concern about the additional complexity of
> including missing data support at all.
> 
> I found that supporting missing data is key to efficiently supporting
> the
> new DType in Pandas. I think that argues that we need some level of
> missing
> data support to fully replace object string arrays. I believe the
> compromise proposal in the NEP is sufficient for downstream libraries
> while
> limiting additional complexity elsewhere in NumPy.
> 
> Concerns raised in previous discussions about concretely specifying
> the C
> API to be made public, preventing use-after-free errors in a
> multithreaded
> context, and uncertainty around the arena allocator implementation
> have
> been resolved in the latest version of the NEP and the open PR.
> Additionally, due to some excellent and timely work by Lysandros
> Nikolaou,
> we now have a number of string ufuncs in NumPy and a straightforward
> plan
> to add more. Loops have been implemented for all the ufuncs added in
> the
> NumPy 2.0 dev cycle so far.
> 
> I would like to see us ship the DType in NumPy 2.0. This will allow
> us to
> advertise a major new feature, will spur efforts to support new
> DTypes in
> downstream libraries, and will allow us to get feedback from the
> community
> that would be difficult to obtain without releasing the code into the
> wild.
> Additionally, I am funded via a NASA ROSES grant for work related to
> this
> effort until the end of 2024, so including the DType in NumPy 2.0
> will more
> efficiently use my funded time to fix issues.
> 
> If there are no substantive objections to this email, then the NEP
> will be
> considered accepted; see NEP 0 for more details:
> https://numpy.org/neps/nep-0000.html
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: sebast...@sipsolutions.net


_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to