On 11/17/22 7:13 PM, Charles R Harris wrote:


On Thu, Nov 17, 2022 at 3:15 PM Ralf Gommers <ralf.gomm...@gmail.com <mailto:ralf.gomm...@gmail.com>> wrote:

    Hi all,

    We have to do something about long double support. This is something I 
wanted to propose a long
    time ago already, and moving build systems has resurfaced the pain yet 
again.

    This is not a full proposal yet, but the start of a discussion and gradual 
plan of attack.

    The problem
    -----------
    The main problem is that long double support is *extremely* painful to 
maintain, probably far
    more than justified. I could write a very long story about that, but 
instead I'll just
    illustrate with some of the key points:

    (1) `long double` is the main reason why we're having such a hard time with 
building wheels on
    Windows, for SciPy in particular. This is because MSVC makes long double 
64-bit, and Mingw-w64
    defaults to 80-bit. So we have to deal with Mingw-w64 toolchains, proposed 
compiler patches,
    etc. This alone has been a massive time sink. A couple of threads:
    https://github.com/numpy/numpy/issues/20348 
<https://github.com/numpy/numpy/issues/20348>
    
https://discuss.scientific-python.org/t/releasing-or-not-32-bit-windows-wheels/282
    
<https://discuss.scientific-python.org/t/releasing-or-not-32-bit-windows-wheels/282>
    The first issue linked above is one of the key ones, with a lot of detail 
about the fundamental
    problems with `long double`. The Scientific Python thread focused more on 
Fortran, however that
    Fortran + Windows problem is at least partly the fault of `long double`. 
And Fortran may be
    rejuvenated with LFortran and fortran-lang.org <http://fortran-lang.org>; 
`long double` is a
    dead end.

    (2) `long double` is not a well-defined format. We support 9 specific 
binary representations,
    and have a ton of code floating around to check for those, and manually 
fiddle with individual
    bits in long double numbers. Part of that is the immediate pain point for 
me right now: in the
    configure stage of the build we consume object files produced by the 
compiler and parse them,
    matching some known bit patterns. This check is so weird that it's the only 
one that I cannot
    implement in Meson (short of porting the hundreds of lines of Python code 
for it to C), see
    https://github.com/mesonbuild/meson/issues/11068
    <https://github.com/mesonbuild/meson/issues/11068> for details. To get an 
idea of the complexity
    here:
    
https://github.com/numpy/numpy/blob/9e144f7c1598221510d49d8c6b79c66dc000edf6/numpy/core/setup_common.py#L264-L434
 
<https://github.com/numpy/numpy/blob/9e144f7c1598221510d49d8c6b79c66dc000edf6/numpy/core/setup_common.py#L264-L434>
    
https://github.com/numpy/numpy/blob/9e144f7c1598221510d49d8c6b79c66dc000edf6/numpy/core/src/npymath/npy_math_private.h#L179-L484
 
<https://github.com/numpy/numpy/blob/9e144f7c1598221510d49d8c6b79c66dc000edf6/numpy/core/src/npymath/npy_math_private.h#L179-L484>
    
https://github.com/numpy/numpy/blob/main/numpy/core/src/npymath/npy_math_complex.c.src#L598-L619
    
<https://github.com/numpy/numpy/blob/main/numpy/core/src/npymath/npy_math_complex.c.src#L598-L619>
    
https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dragon4.c#L2480-L3052
    
<https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/dragon4.c#L2480-L3052>
    Typically `long double` has multiple branches, and requires more code than 
float/double.

    (3) We spend a lot of time dealing with issues and PR to keep `long double` 
working, as well as
    dealing with hard-to-diagnose build issues. Which sometimes even stops 
people from
    building/contributing, especially on Windows. Some recent examples:
    https://github.com/numpy/numpy/pull/20360 
<https://github.com/numpy/numpy/pull/20360>
    https://github.com/numpy/numpy/pull/18536 
<https://github.com/numpy/numpy/pull/18536>
    https://github.com/numpy/numpy/pull/21813 
<https://github.com/numpy/numpy/pull/21813>
    https://github.com/numpy/numpy/pull/22405 
<https://github.com/numpy/numpy/pull/22405>
    https://github.com/numpy/numpy/pull/19950 
<https://github.com/numpy/numpy/pull/19950>
    https://github.com/numpy/numpy/pull/18330/commits/aa9fd3c7cb
    <https://github.com/numpy/numpy/pull/18330/commits/aa9fd3c7cb>
    https://github.com/scipy/scipy/issues/16769 
<https://github.com/scipy/scipy/issues/16769>
    https://github.com/numpy/numpy/issues/14574 
<https://github.com/numpy/numpy/issues/14574>

    (4) `long double` isn't all that useful. On both Windows and macOS `long 
double` is 64-bit,
    which means it is just a poor alias to `double`. So it does literally 
nothing for the majority
    of our users, except confuse them and take up extra memory. On Linux, `long 
double` is 80-bit
    precision, which means it doesn't do all that much there either, just a 
modest bump in precision.

    Let me also note that it's not just the user-visible dtypes that we have to 
consider; long
    double types are also baked into the libnpymath static library that we ship 
with NumPy. That's a
    thing we have to do something about anyway (shipping static libraries is 
not the best idea, see
    https://github.com/numpy/numpy/issues/20880 
<https://github.com/numpy/numpy/issues/20880>). We
    just have to make sure to not forget about it when thinking about solutions 
here.


    Potential solutions
    -------------------

    (A) The ideal solution would be to have a proper, portable quad-precision 
(128 bits of
    precision) dtype. It's now possible to write that externally, after all the 
work that Sebastian
    and others have put into the dtype infrastructure. The dtype itself already 
exists
    (https://github.com/peytondmurray/quaddtype 
<https://github.com/peytondmurray/quaddtype>, maybe
    there are more implementations floating around). It just need the people 
who have an actual need
    for it to drive that. It's still a significant amount of work, so I'll not 
go into this one more
    right now.

    (B) Full deprecation and removal of all `long double` support from NumPy 
(and SciPy),
    irrespective of whether the quad dtype comes to life.

    Given the current state, I'm personally convinced that that is easily 
justified. However, I know
    some folks are going to be hesitant, given that we don't know how many 
remaining users we have
    or what use cases they have. So let's see if we can find more granular 
solutions (note: these
    are ideas, not all fully researched solutions that we can pick from and are 
guaranteed to work
    out well).

    (C) Only support `long double` where it has proper compiler support 
(C99/C++11), so using it
    "just works". And remove all the support for old formats and accessing bit 
representations
    directly. This also implies making some optional functions mandatory. For 
example, the issue I
    ran into today showed up at runtime in a fallback path for systems that 
don't have `strtold_l`.
    We don't test such fallback paths in CI, so they're going to be fragile 
anyway.

    (D) Add a build mode with a command-line flag, where we typedef `long 
double` to `double`. I'll
    note that we already touched on that once (see 
https://github.com/numpy/numpy/issues/20348
    <https://github.com/numpy/numpy/issues/20348>); I'm not sure though if it's 
fundamentally not a
    good idea or that we just didn't do it intentionally enough.


    Next steps
    ----------

    First, it would be great to hear from folks who have use cases for long 
double support in NumPy.
    So far we have very little knowledge of that, we only see the problems and 
work it causes us.

    Second, I'm going to have to add support for (C) or (D) temporarily to the 
Meson build anyway,
    as we run into things. It can be worked around if we really have to by 
implementing support for
    long double format detection in Meson, or by rewriting all the detection 
logic so it's
    all-in-one in C. But that takes a significant amount of time.

    Third, let's figure out which way we'd like to go. Do you see alternative 
solutions? Or like any
    of the ones I listed more than others? Are you willing to jump in and work 
on a quad dtype?

    Cheers,
    Ralf


I would agree that extended precision is pretty useless, IIRC, it was mostly intended as an accurate way to produce double precision results. That idea was eventually dropped as not very useful. I'd happily do away with subnormal doubles as well, they were another not very useful idea. And strictly speaking, we should not support IBM double-double either, it is not in the IEEE standard.

That said, I would like to have a quad precision type. That precision is useful for some things, and I have a dream that someday it can be used for a time type. Unfortunately, last time I looked around, none of the available implementations had a NumPy compatible license.

The tricky thing here is to not break downstream projects, but that may be unavoidable. I suspect the fallout will not be that bad.

Chuck

A quick response from one of the leaders of a team that requires 80bit extended precision for astronomical work...

"extended precision is pretty useless" unless you need it. And the high-precision pulsar timing community needs it. Standard double precision (64-bit) values do not contain enough precision for us to pass relative astronomical times via a single float without extended precision (the precision ends up being at the ~1 microsec level over decades of time differences, and we need it at the ~1-10ns level) nor can we store the measured spin frequencies (or do calculations on them) of our millisecond pulsars with enough precision. Those spin frequencies can have 16-17 digits of base-10 precision (i.e. we measure them to that precision). This is why we use 80-bit floats (usually via Linux, but also on non X1 Mac hardware if you use the correct compilers) extensively.

Numpy is a key component of the PINT software to do high-precision pulsar timing, and we use it partly *because* it has long double support (with 80-bit extended precision):
https://github.com/nanograv/PINT
And see the published paper here, particularly Sec 3.3.1 and footnote #42:
https://ui.adsabs.harvard.edu/abs/2021ApJ...911...45L/abstract

Going to software quad precision would certainly work, but it would definitely make things much slower for our matrix and vector math.

We would definitely love to see a solution for this that allows us to get the extra precision we need on other platforms besides Intel/AMD64+Linux (primarily), but giving up extended precision on those platforms would *definitely* hurt. I can tell you that the pulsar community would definitely be against option "B". And I suspect that there are other users out there as well.

Scott
NANOGrav Chair
www.nanograv.org


--
Scott M. Ransom            Address:  NRAO
Phone:  (434) 296-0320               520 Edgemont Rd.
email:  sran...@nrao.edu             Charlottesville, VA 22903 USA
GPG Fingerprint: A40A 94F2 3F48 4136 3AC4  9598 92D5 25CB 22A6 7B65
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to