[Numpy-discussion] Summation of large float32/float64 arrays

Matthew Turk Fri, 21 May 2010 13:14:37 -0700

Hi all,

I have a possibly naive question.  I don't really understand this
particular set of output:


In [1]: import numpy

In [2]: a1 = numpy.random.random((512,512,512)).astype("float32")

In [3]: a1.sum(axis=0).sum(axis=0).sum(axis=0)
Out[3]: 67110312.0

In [4]: a1.sum()
Out[4]: 16777216.0

I recognize that the intermediate sums may accumulate error
differently than a single call to .sum(), but I guess my concern is
that it's accumulating a lot faster than I anticipated.  (Interesting
to note that a1.sum() returns 0.5*512^3, down to the decimal; is it
summing up the mean, which should be ~0.5?)  However, with a 256^3
array:

In [1]: import numpy

In [2]: a1 = numpy.random.random((256,256,256)).astype("float32")

In [3]: a1.sum(axis=0).sum(axis=0).sum(axis=0)
Out[3]: 8389703.0

In [4]: a1.sum()
Out[4]: 8389245.0

The errors are much more reasonable.  Is there an overflow or
something that occurs with the 512^3?  These problems all go
completely away with a float64 array, but the issue originally showed
up when trying to normalize an on-disk float32 array of size 512^3,
where the normalization factor was off by a substantial factor (>2x)
depending on the mechanism used to sum.  My suspicion is that perhaps
I have a naive misconception about intermediate steps in summations,
or there is a subtlety I'm missing here.

I placed a sample script I used to test this here:

http://pastebin.com/dGbHwFPK

Thanks for any insight anybody can provide,

Matt
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Summation of large float32/float64 arrays

Reply via email to