HI Sebastian, Thanks for the overview! In the value-based casting, what perhaps surprises me most is that it is done within a kind; it would seem an improvement to check whether a given integer scalar is exactly representable in a given float (your example of 1024 in `float16`). If we switch to the python-only scalar values idea, I would suggest to abandon this. That might make dealing with things like `Decimal` or `Fraction` easier as well.
All the best, Marten On Tue, Jun 11, 2019 at 8:46 PM Sebastian Berg <sebast...@sipsolutions.net> wrote: > Hi all, > > strange, something went wrong sending that email, but in any case... > > I tried to "summarize" the current behaviour of promotion and value > based promotion in numpy (correcting a small error in what I wrote > earlier). Since it got a bit long, you can find it here (also copy > pasted at the end): > > https://hackmd.io/NF7Jz3ngRVCIQLU6IZrufA > > Allan's document which I link in there is also very interesting. One > thing I had not really thought about before was the problem of > commutativity. > > I do not have any specific points I want to discuss based on it (but > those are likely to come up again later). > > All the Best, > > Sebastian > > > ----------------------------- > > PS: Below a copy of what I wrote: > > --- > title: Numpy Value Based Promotion Rules > author: Sebastian Berg > --- > > > > NumPy Value Based Scalar Casting and Promotion > ============================================== > > This document reviews some of the behaviours of the promotion rules > within numpy. This is especially with respect to the promotion of > scalars and 0D arrays which inspect the value to decide casting and > promotion. > > Other documents discussing these things: > > * `from numpy.testing import print_coercion_tables` prints the > current promotion tables including value based promotion for small > positive/negative scalars. > * Allan Haldane's thoughts on changing casting/promotion to be more > C-like and discussing things such as here: > https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 > * Discussion around the problem of uint64 and int64 being promoted to > float64: https://github.com/numpy/numpy/issues/12525 (lists many > related issues). > > > Nomenclature and Defintions > --------------------------- > > * **dtype/type**: The data type of an array or scalar: `float32`, > `float64`, `int8`, … > > * **Category**: A category to which the data type belongs, in this > context these are: > 1. boolean > 2. integer (unsigned and signed are not split up here, but are > different "kinds") > 3. floating point and complex (not split up here but are different > "kinds") > 5. All others > > * **Casting**: converting from one dtype to another. There are four > different rules of casting: > 1. *"safe"* casting: All values are representable in the new data > type. I.e. no information is lost during the conversion. > 2. *"same kind"* casting: data loss may occur, but only within the > same "kind". For example a float64 can be converted to float32 using > "same kind" rules, an int64 can be converted to int16. This is although > both lose precision or even produce incorrect values. Note that "kind" > is different from "category" in that it distinguishes between signed > and unsigned integers. > 4. *"unsafe"* casting: Any conversion which can be defined, e.g. > floating point to integer. For promotion this is fairly unimportant. > (Some conversions such as string to integer, which not even work fall > in this category, but could also be called coercions or conversions.) > > * **Promotion**: The general process of finding a new dtype for > multiple input dtypes. Will be used here to also denote any kind of > casting/promotion done before a specific function is called. This can > be more complex, because in rare cases a functions can for example take > floating point numbers and integers as input at the same time (i.e. > `np.ldexp`). > > * **Common dtype**: A dtype which can represent all input data. In > general this means that all inputs can be safely cast to this dtype. > Within numpy this is the normal and simplest form of promotion. > > * **`type1, type2 -> type3`**: Defines a promotion or signature. For > example adding two integers: `np.int32(5) + np.int32(3)` gives > `np.int32(8)`. The dtype signature for that example would be: `int32, > int32 -> int32`. A short form for this is also `ii->i` using C-like > type codes, this can be found for example in `np.ldexp.types` (and any > numpy ufunc). > > * **Scalar**: A numpy or python scalar or a **0-D array**. It is > important to remember that zero dimensional arrays are treated just > like scalars with respect to casting and promotion. > > > Current Situation in Numpy > -------------------------- > > The current situation can be understand mostly in terms of safe casting > which is defined based on the type hierarchy and is sensitive to values > for scalars. > > This safe casting based approach is in contrast for example to > promotion within C or Julia, which work based on category first. For > example `int32` cannot be safely cast to `float32`, but C or Julia will > use `int32, float32 -> float32` as the common type/promotion rule for > example to decide on the output dtype for addition. > > > ### Python Integers and Floats > > Note that python integers are handled exactly like numpy ones. They > are, however, special in that they do not have a dtype associated with > them explicitly. Value based logic, as described here, seems useful for > python integers and floats to allow: > ``` > arr = np.arange(10, dtype=np.int8) > arr += 1 > # or: > res = arr + 1 > res.dtype == np.int8 > ``` > which ensures that no upcast (for example with higher memory usage) > occurs. > > > ### Safe Casting > > Most safe casting is clearly defined based on whether or not any > possible value is representable in the ouput dtype. Within numpy there > is currently a single exception to this rule: `np.can_cast(np.int64, > np.float64, casting="safe")` is considered to be true although float64 > cannot represent some large integer values exactly. In contrast, > `np.can_cast(np.int32, np.float32, casting="safe")` is `False` and > `np.float64` would have to be used if a "safe" cast is desired. > > This exception may be one thing that should be changed, however, > concurrently the promotion rules have to be adapted to keep doing the > same thing, or a larger behaviour change decided. > > > #### Scalar based rules > > Unlike arrays, where inspection of all values is not feasable, for > scalars (and 0-D arrays) the value is inspected. The casting becomes a > two step process: > 1. The minimal dtype capable of holding the value is found. > 2. The normal casting rules are applied to the new dtype. > > The first step uses the following rules by finding the minimal dtype > within its category: > > * Boolean: Dtype is already minimal > > * Integers: > Casting is possible if output can hold the value. This includes > uint8(127) casting to an int8. > > * Floats and Complex > Scalars can be demoted based on value, roughly this avoids > overflows: > ``` > float16: -65000 < value < 65000 > float32: -3.4e38 < value < 3.4e38 > float64: -1.7e308 < value < 1.7e308 > float128 (largest type, does not apply). > ``` > For complex, the logic is simply applied to both real and imaginary > part. Complex numbers cannot be downcast to floating point. > > * Others: Dtype is not modified. > > > This two step process means that `np.can_cast(np.int16(1024), > np.float16)` is `False` even though float16 is capable of exactly > representing the value 1024, since value based "demotion" to a lower > dtype is used only within each category. > > > > ### Common Type Promotion > > For most operations in numpy the output type is just the common type of > the inputs, this holds for example for concatenation, as well as almost > all math funcions (e.g. addition and multiplication have two identical > inputs and need one ouput dtype). This operation is exposed as > `np.result_type` which includes value based logic, and > `np.promote_types` which only accepts dtypes as input. > > Normal type promotion without value based/scalar logic finds the > smallest type which both inputs can cast to safely. This will be the > largest "kind" (bool < unsigned < integer < float < complex < other). > > Note that type promotion is handled in a "reduce" manner from left to > right. In rare cases this means it is not associatetive: `float32, > uint16, int16 -> float32`, but `float32, (uint16, int16) -> float64`. > > #### Scalar based rule > > When there is a mix of scalars and arrays, numpy will usually allow the > scalars to be handled in the same fashion as for "safe" casting rules. > > The rules are as follows: > > 1. Value based logic is only applied if the "category" of any array is > larger or equal to the category of all scalars. If this is not the > case, the typical rules are used. > * Specifically, this means: `np.array([1, 2, 3], dtype=np.uint8) + > np.float64(12.)` gives a `float64` result, because the > `np.float64(12.)` is not considered for being demoted. > > 2. Promotion is applied as normally, however, instead of the original > dtype, the minimal dtype is used. In the case where the minimal data > type is unsigned (say uint8) but the value is small enough, the minimal > type may in fact be either `uint8` or `int8` (127 can be both). This > promotion is also applied in pairs (reduction-like) from left to right. > > > ### General Promotion during Function Execution > > General functions (read "ufuncs" such as `np.add`) may have a specific > dtype signature which is (for most dtypes) stored e.g. as > `np.add.types`. For many of these functions the common type promotion > is used unchanged. > > However, some functions will employ a slightly different method (which > should be equivalent in most cases). They will loop through all loops > listed in `np.add.types` in order and find the first one to which all > inputs can be safely cast: > ``` > np.divide.types = ['ee->e', 'ff->f', 'dd->d', ...] > ``` > Thus, `np.divide(np.int16(4), np.float16(3)` will refuse the first > `float16, float16 -> float16` (`'ee->e'`) loop because `int16` cannot > be cast safely, and then pick the float32 (`'ff->f'`) one. > > For simple functions, which commonly have two identical inputs, this > should be identical, since normally a clear order exists for the dtypes > (it does require checking int8 before uint8, etc.). > > #### Scalar based rule > > When scalars are involved, the "safe" cast logic based on values is > applied *if and only if* rule 1. applies as before: That is there must > be an array with a higher or equal category as all of the scalars. > > In the above `np.divide` example, this means that > `np.divide(np.int16(4), np.array([3], dtype=np.float16))` *will* use > the `'ee->e'` loop, because the scalar `4` is of a lower or equal > category than the array (integer <= float or complex). While checking, > 4 is found to be safely castable to float16, since `(u)int8` is > sufficient to hold 4 and that can be safely cast to `float16`. > However, `np.divide(np.int16(4), np.int16(3))` would use `float32` > because both are scalars and thus value based logic is not used (Note > that in reality numpy forces double output for an all integer input in > divide). > > In it is possible for ufuncs to have mixed type signatures (this is > very rare within numy) and arbitrary inputs. In this case, in > principle, the question is whether or not a clear ordering exists and > if the rule of using value based logic is always clear. This is rather > academical (I could not find any such function in numpy or > `scipy.special` [^scipy-ufuncs]). But consider: > ``` > imaginary_ufunc.types: > int32, float32 -> int32, float32 > int64, float32 -> int64, float32 > ... > ``` > it is not clear that `np.int64(5) + np.float32(3.)` should be able to > demote the `5`. This is very theoretical of course > > > > > Footnotes > --------- > > [^scipy-ufuncs]: See for example these functions: > ```python > import scipy.special > for n, func in scipy.special.__dict__.items(): > if not isinstance(func, np.ufunc): > continue > > if func.nin == 1: > # a single input is not interesting > continue > > # check if the signature is not uniform > for types in func.types: > if len(set(types[:func.nin])) != 1: > break > else: > continue > print(func, func.types) > ``` > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion