Hi all, the week has passed, and it has been discussed quite a bit longer, so I assume that NEP 41 can effectively be accepted.
Even then, I will bring up one point again. I hope that if there is still need for discussion, it will hopefully happen in a timely manner, so that, I can go ahead with some changes proposed in NEP 41, and in the event of more concrete doubts/issues there will only be few changes that need to be undone. I would hate to revert large amount of work, simply because an important point/issue is raised in two months instead of two weeks. This whole thing is fairly complex, so please do not hesitate to ask for clarifications! I am also very happy to do a video conference with anyone interested at any time, or chat in private on Slack. So just in case: I will be available around 11:00 PDT (18 UTC) this Thursday on the NumPy Community Call zoom link [0]. As far as I am aware, there was only one (maybe 2, see point 2. below which may be independent) discussion points. In my proposal the DType class (i.e. `type(np.dtype("float64")`), is the core concept and different for every scalar type. It holds all the information on how to deal with array elements. This is some duplication of scalar types and it means that there would be (usually) exactly one DType for each (NumPy) scalar, possibly exposed using: np.dtype[scalar_type] e.g. np.dtype[np.float64] That does create a certain duality. For each scalar type/class, there is a corresponding DType class. And in theory the scalar does not even need to know that NumPy has a DType for it. From a typing theoretical point of view this is also a bit strange. The type of each array element is identical to the scalar type! But although there is only one type, there are two distinct classes: one for the scalar value, and one to explain them to NumPy and store them in an array. I lean in that direction because: 1. I wanted to modify scalars as little as possible, I am not sure we will enable this initially, but this is so that: * In principle you can create a DType for every Python type without touching the original Python scalar. * The scalar need not know about NumPy or DTypes thus creating no new dependency. (you can use the scalar without installing NumPy) 2. I somewhat like that DType classes have methods that get a "self" instance argument and are provided with the data by the array. * This makes functions `dtype.__get_array_item__(item_memory)` is implemented like a method: class DType: def __get_array_item__(self, item_memory): return item * There is an alternative approach to this, that I did not think about much, though. `item_memory` really is much like a scalar instance (it holds the actual value), so you can argue that `item_memory` is `self` here, and the dtype instance is the type of `item_memory` (the self). E.g. making `__get_array_item__` live on the dtype (not on the class). The dtype thus is the type/class of the array element. This is beautiful, but, in generally you still need to pass the dtype instance itself. For example strings cannot interpret without knowing their length. In other words, the scalar `self` is actually the tuple `(item_memory, dtype)`, which I think is why at least I do not have a clear grasp here. [1] 3. There may be `dtypes` without specific scalar types. I am not sure this is actually a tidy theoretical concept, but an example is the current Pandas Categorical. The type of the scalars within a categorical array are arbitrary. I am not actually sure that is theoretically tidy. E.g. Python uses `enum.Enum`, a class factory, for a similar purpose, and you have to use the `.value` attribute. But, desirable or not, it would seem less straight forward to potentially allow if we design this around the scalar type. The main downside to using DTypes as proposed in NEP 41 in my opinion is what I mentioned first: We must have a DType class for every scalar class, even though at least most scalars (i.e. all NumPy scalars, except the `object` dtype) can easily be expanded into including all necessary information, maybe they already include almost all of it. In the NEP 41 framework the scalar could be build from the DType in practice. Which may seem a bit strange. In general Scalar<->DType will form a Unit of a sort. And this means that somewhere we have to map scalars to DTypes. So, in many ways, I actually do find the scalar version tidier myself. But, I also find the "there is a DType class for every scalar type/class" a straight forward user story even if there will be subtle difference between DType and scalar class/type. The point 2. may be independent of the whole scalar story, I am conflating it here, because to me it applies more naturally in that context. Cheers, Sebastian [0] See the community meeting agenda document for the link: https://hackmd.io/76o-IxCjQX2mOXO_wwkcpg [1] These are thoughts mainly from: https://gist.github.com/eric-wieser/49c55bcab744b0e782f6c2740603180b#what-this-could-mean-for-dtypes and a discussion on the pull request, and I will not claim to represent them quite correctly and especially fully here.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion