On Mon, Jan 2, 2017 at 6:27 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > Hi All, > > Currently functions like trace use the C long type as the default > accumulator for integer types of lesser precision: > >> dtype : dtype, optional >> Determines the data-type of the returned array and of the accumulator >> where the elements are summed. If dtype has the value None and `a` is >> of integer type of precision less than the default integer >> precision, then the default integer precision is used. Otherwise, >> the precision is the same as that of `a`. > > > The problem with this is that the precision of long varies with the platform > so that the result varies, see gh-8433 for a complaint about this. There > are two possible alternatives that seem reasonable to me: > > Use 32 bit accumulators on 32 bit platforms and 64 bit accumulators on 64 > bit platforms. > Always use 64 bit accumulators.
This is a special case of a more general question: right now we use the default integer precision (i.e., what you get from np.array([1]), or np.arange, or np.dtype(int)), and it turns out that the default integer precision itself varies in confusing ways, and this is a common source of bugs. Specifically: right now it's 32-bit on 32-bit builds, and 64-bit on 64-bit builds, except on Windows where it's always 32-bit. This matches the default precision of Python 2 'int'. So some options include: - make the default integer precision 64-bits everywhere - make the default integer precision 32-bits on 32-bit systems, and 64-bits on 64-bit systems (including Windows) - leave the default integer precision the same, but make accumulators 64-bits everywhere - leave the default integer precision the same, but make accumulators 64-bits on 64-bit systems (including Windows) - ... Given the prevalence of 64-bit systems these days, and the fact that the current setup makes it very easy to write code that seems to work when tested on a 64-bit system but that silently returns incorrect results on 32-bit systems, it sure would be nice if we could switch to a 64-bit default everywhere. (You could still get 32-bit integers, of course, you'd just have to ask for them explicitly.) Things we'd need to know more about before making a decision: - compatibility: if we flip this switch, how much code breaks? In general correct numpy-using code has to be prepared to handle np.dtype(int) being 64-bits, and in fact there might be more code that accidentally assumes that np.dtype(int) is always 64-bits than there is code that assumes it is always 32-bits. But that's theory; to know how bad this is we would need to try actually running some projects test suites and see whether they break or not. - speed: there's probably some cost to using 64-bit integers on 32-bit systems; how big is the penalty in practice? -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion