On Sun, Dec 13, 2020 at 7:29 PM Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Sun, 2020-12-13 at 19:00 +1100, Juan Nunez-Iglesias wrote: > > > > > > > On 13 Dec 2020, at 6:25 am, Sebastian Berg < > > > sebast...@sipsolutions.net> wrote: > > > > > > But "default" in NumPy really doesn't mean a whole lot? I can > > > think of > > > three places where "defaults" exists: > > > > Huh? There are platform-specific defaults for literally every array > > creation function in NumPy? > > > > In [1]: np.array([4, 9]).dtype > > Out[1]: dtype('int64') > <snip> > > The list goes on… > > > > I should have been more clear about this and my opinion on it: > > 1. The whole list comes down to my point 1: when confronted with a > Python integer, NumPy will typically use a C-long [1]. > Additionally, `dtype=int` is always the same as long: > `np.dtype(int) == np.dtype("long")`. > > The reason why I see that as a single point, is that it is defined in a > single place in C [1]. (The `np.dtype(int)` is a second place.) > > > 2. I agree with Ralf that this is "random". On the same computer you > can easily get a wrong result for the identical code because you boot > into windows instead of linux [2]. `long` is not a good default! It is > 32bit on windows and 64bit on (64bit) linux! That should confuse the > majority of our users (and probably many who are aware of C integer > types). > Good defaults are awesome, but I just can't see how `long` is a good > default. There were good reasons for it on Python 2, but that is not > relevant anymore. > > > 3. I think that `intp` would be a much saner default for most code. It > gives a system dependent result, but two points are in its favor: > > * NumPy generates `intp` in quite a lot of places > * It is always safe (and fast) to index arrays with `intp` > > > > And, indeed, mixing types can cause implicit casting, and thus both > > slowness and unexpected type promotion, which brings with it its own > > bugs… Again, I think it is valuable to have syntax to express > > `np.zeros(…, dtype=<whatever-dtype-np.array(…)-would-give-for-my- > > data>)`. > > Yes, it is valuable, but I am unsure we should advise to use it... > Agreed, it should be possible for people who know that's what they want, but an "always int64" default would be way better. Before we had 32-bit CI, I developed on 32-bit Linux on purpose, and found multiple newly-introduced bugs in NumPy and Scipy each release cycle. Risking correctness issues like overflows is far worse than possible sub-optimal performance. For that same reason, float96/float128 are very annoying. Users don't realize that those aren't portable. Cheers, Ralf > Cheers, > > Sebastian > > > > [1] Currently defined here: > > https://github.com/numpy/numpy/blob/7a42940e610b77cee2f98eb88aed5e66ef6d8c2a/numpy/core/src/multiarray/abstractdtypes.c#L16-L45 > Which will use `long` normally, but `long long` (64bit) if that fails > and even `unsigned long long` if *that* fails also. > > > [2] I would not be surprised if there are quite a few libraries with > bugs for very large arrays, that are simply not found yet, because > nobody tried to run the code on very large arrays on a windows > workstation yet. > > > > > Juan. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion