On Fri, 2020-12-11 at 12:09 +0100, Ralf Gommers wrote: > On Wed, Dec 9, 2020 at 5:22 PM Sebastian Berg < > sebast...@sipsolutions.net> > wrote: > > > Hi all, > > > > Sorry that this will again be a bit complicated again :(. In brief: > > > > * I would like to pass around scalars in some (partially new) C-API > > to implement value-based promotion. > > * There are some subtle commutativity issues with promotion. > > Commutativity may change in that case (with respect of value > > based > > promotion, probably to the better normally). [0] > > > > > > In the past days, I have been looking into implementing value-based > > promotion in a way that I had done it for Prototype before. > > The idea was that NEP 42, allows for the creation of DType > > dynamically, > > which does allow very powerful value based promotion/casting. > > > > But I decided there are too many quirks with creating type > > instances > > dynamically (potentially very often) just to pass around one > > additional > > piece of information. > > That approach was far more powerful, but it is power and complexity > > that we do not require, given that: > > > > * Value based promotion is only used for a mix of scalars and > > arrays > > (where "scalar" is annoyingly defined as 0-D at the moment) > > * I assume it is only relevant for `np.result_type` and promotion > > in ufuncs (which often uses `np.result_type`). > > `np.can_cast` has such behaviour, but I think it is easier [1]. > > We could implement more powerful "value based" logic, but I doubt > > it is worthwhile. > > * This is already stretching the Python C-API beyond its limits. > > > > > > So I will suggest this instead which *must* modify some (poorly > > defined) current behaviour: > > > > 1. We always evaluate concrete DTypes first in promotion, this > > means > > that in rare cases the non-commutativity of promotion may change > > the result dtype: > > > > np.result_type(-1, 2**16, np.float32) > > > > The same can also happens when you reorder the normal dtypes: > > > > np.result_type(np.int8, np.uint16, np.float32) > > np.result_type(np.float32, np.int8, np.uint16) > > > > in both cases the `np.float32` is moved to the front > > > > 2. If we reorder the above operation, we can define that we never > > promote two "scalar values". Instead we convert both to a > > concrete one first. This makes it effectively like: > > > > np.result_type(np.array(-1).dtype, np.array(2**16).dtype) > > > > This means that we never have to deal with promoting two values. > > > > 3. We need additional private API (we were always going to need > > some > > additional API); That API could become public: > > > > * Convert a single value into a concrete dtype, you could say > > the same as `self.common_dtype(None)`, but a dedicated > > function > > seems simpler. A dtype like this will never use > > `common_dtype()`. > > * `common_dtype_with_scalar(self, other, scalar)` (note that > > only one of the DTypes can have a scalar). > > As a fallback, this function can be implemented by converting > > to the concrete DType and retrying with the normal > > `common_dtype`. > > > > (At leas the second slot must be made public we are to allow > > value > > based promotion for user DTypes. I expect we will, but it is not > > particularly important to me right now.) > > > > 4. Our public API (including new C-API) has to expose and take the > > scalar values. That means promotion in ufuncs will get DTypes > > and > > `scalar_values`, although those should normally be `NULL` (or > > None). > > > > In future python API, this is probably acceptable: > > > > np.result_type([t if v is None else v for t, v in > > zip(dtypes, > > scalar_values)]) > > > > In C, we need to expose a function below `result_type` which > > accepts both the scalar values and DTypes explicitly. > > > > 5. For the future: As said many times, I would like to deprecate > > using value based promotion for anything except Python core > > types. > > That just seems wrong and confusing. > > > > I agree with this.
It is tempting to wonder what would happen if we dropped it entirely, but I fear my current assumption is that it should keep working largely unchanged with careful deprecations hopefully added soon... > Value-based promotion was never a great idea, so let's > try to keep it as minimal as possible. I'm not even sure what kind of > value-based promotion for non Python builtin types is happening now > (?). It (roughly?) identical for all zero dimensional objects: arr1 = np.array(1, dtype=np.int64) arr2 = np.array([1, 2], dtype=np.int32) (arr1 + arr2).dtype == np.int32 (1 + arr2).dtype == np.int32 In the first addition `arr1` behaves like the Python `1` even though it has a dtype attached. The reason for this probably that our entry-points greedily convert arrays. And it shows one caveat: If we/SciPy call `np.asarray` on a Python integer input we would lose value-based behaviour, this may actually be a bigger pain point (see below for example). > > My only problem is that while I can warn (possibly sometimes too > > often) when behaviour will change. I do not have a good idea > > about > > silencing that warning. > > > > Do you see a real issue with this somewhere, or is it all just corner > cases? In that case no warning seems okay. > Probably it is mostly corner cases, if you do: arr_uint16 + int32(1) + 1. We would warn for the first case, but not for: arr_uint16 + (int32(1) + 1.) even though it gives identical results. The same might happen in `np.concatenate` where all arguments are passed at once. I can think of one bigger pain point for this type of function: def function(arr1, arr2): arr1 = np.asarray(arr1) arr2 = np.asarray(arr2) return arr1 + arr2 # some complex code we could add a cast to the function I guess. But for the end-user it might be tricky to realize that they need to cast the input to that function. And those type of functions are abundant... > > > > > Note that this affects NEP 42 (a little bit). NEP 42 currently > > makes a > > nod towards the dynamic type creation, but falls short of actually > > defining it. > > > So These rules have to be incorporated, but IMO they do not affect > the > > general design choices in the NEP. > > > > > > There is probably even more complexity to be found here, but for > > now > > the above seems to be at least good enough to make headway... > > > > > > Any thoughts or clarity remaining that I can try to confuse? :) > > > > My main question is why you're considering both deprecating and > expanding > public API (in points 3 and 4). If you have a choice, keep everything > private I'd say. > I had to realize that the non-associativity is trickier to solve. Still digging into that... But, I guess we can probably live with it if user DTypes can show some non-associativity even in a single call to `np.result_type` or `np.concatenate`. Generally I don't have much squirms, as long as things don't get worse (they are already broken). The premise for requiring some new public API is that for us: int16(1) + 1 == int16(2) # value based for Python 1 A user implements int24, if it is to fit in perfectly we would like: int24(1) + 1 == int24(2) Which requires some way to pass `int24` the information that it is a Python `1` in some form (there are probably many options for how to pass it). Exposure in promotion might be interesting for weirdly complex ufuncs, like `scipy.special.eval_jacobi` which have mixed type inputs. Again a corner case of a corner case, but would prefer if there was a (possible) future solution. Cheers, Sebastian > My other question is: this is a complex story, it all sounds > reasonable but > do you need more feedback than "sounds reasonable"? > > Cheers, > Ralf > > > > > Cheers, > > > > Sebastian > > > > > > > > [0] We could use the reordering trick also for concrete DTypes, > > although, that would require introducing some kind of priority... I > > do > > not like that much as public API, but it might be something to look > > at > > internally or for types deriving from the builtin abstract DTypes: > > * inexact > > * other > > > > Just evaluating all `inexact` first would probably solve our > > commutativity issues. > > > > [1] NumPy uses `np.can_cast(value, dtype)` also. For example: > > > > np.can_cast(np.array(1., dtype=np.float64), np.float32, > > casting="safe") > > > > returns True. My working hypothesis is that `np.can_cast` as above > > is > > just a side battle. I.e. we can either: > > > > * Flip the switch on it (can-cast does no value based logic, even > > though we use it internally, we do not need it). > > * Or, we can implement those cases of `np.can_cast` by using > > promotion. > > > > The first one is tempting, but I assume we should go with the > > second > > since it preserves behaviour and is slightly more powerful. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion