On Thu, Aug 19, 2021 at 2:13 AM Jerry Morrison <
jerry.morrison+nu...@gmail.com> wrote:

>
> I'll put forth an expectation that after installing a specific set of
> libraries, the floating point results would be identical across platforms
> and into the future. Ideally developers could install library updates (for
> hardware compatibility, security fixes, or other reasons) and still get
> identical results.
>
> That expectation is for reproducibility, not high accuracy. So it'd be
> fine to install different libraries [or maybe use those pip package options
> in brackets, whatever they do?] to trade accuracy for speed. Could any
> particular choice of accuracy still provide reproducible results across
> platforms and time?
>

While this would be nice, in practice bit-identical results for floating
point NumPy functions across different operating systems and future time is
going to be impractical to achieve.  IEEE-754 helps by specifying the
result of basic floating point operations, but once you move into special
math functions (like cos()) or other algorithms that can be implemented in
several "mathematically equivalent" ways, bit-level stability basically
becomes impossible without snapshotting your entire software stack.  Many
of these special math functions are provided by the operating system, which
generally do not make such guarantees.

Quick example: Suppose you want to implement sum() on a floating point
array.  If you start at the beginning of the array and iterate to the end,
adding each element to an accumulator, you will get one answer.  If you do
mathematically equivalent pairwise summations (using a temporary array for
storage), you will get a different, and probably more accurate answer.
Neither answer will (in general) be the same as summing those numbers
together with infinite precision, then rounding to the closest floating
point number at the end.  We could decide to make the specification for
sum() also specify the algorithm for computing sum() to ensure we make the
same round-off errors every time.  However, this kind of detailed
specification might be harder to write for other functions, or might even
lock the library into accuracy bugs that can't be fixed in the future.

I think the most pragmatic thing you can hope for is:

   - Bit-identical results with containers that snapshot everything,
   including the system math library.
   - Libraries that specify their accuracy levels when possible, and
   disclose when algorithm changes will affect the bit-identicalness of
   results.

On a meta-level, if analysis conclusions depend on getting bit-identical
results from floating point operations, then you really want to use a
higher precision float and/or an algorithm less sensitive to round-off
error.  Floating point numbers are not real numbers.  :)
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to