On 08/19/2010 04:20 PM, [email protected] wrote:
On Thu, Aug 19, 2010 at 4:03 PM, John Salvatier
<[email protected]>  wrote:
Precise in what sense? Numerical accuracy? If so, why is that?
I don't remember where I ran into this example, maybe integer
underflow (?) with addition.
NIST ANOVA test cases have some nasty badly scaled variables

but I have problems creating one, difference in 10th or higher digit

a = 1000000*np.random.randn(10000,1000)
a.sum()
-820034796.05545747
np.sort(a.ravel())[::-1].sum()
-820034795.87886333
np.sort(a.ravel()).sum()
-820034795.88172638
np.sort(a,0)[::-1].sum()
-820034795.82333243
np.sort(a,1)[::-1].sum()
-820034796.05559027
a.sum(-1).sum(-1)
-820034796.05551744
np.sort(a,1)[::-1].sum(-1).sum(-1)
-820034796.05543578
np.sort(a,0)[::-1].sum(-1).sum(-1)
-820034796.05590343
np.sort(a,1).sum(-1).sum(-1)
-820034796.05544424
am = a.mean()
am*a.size + np.sort(a-am,1).sum(-1).sum(-1)
-820034796.05554879
a.size * np.sort(a,1).mean(-1).mean(-1)
-820034796.05544722

badly scaled or badly sorted arrays don't add up well

but I'm not able to get worse than 10th or 11th decimal in some random
generated examples with size 10000x1000

Josef



On Thu, Aug 19, 2010 at 12:13 PM,<[email protected]>  wrote:
On Thu, Aug 19, 2010 at 11:29 AM, Joe Harrington<[email protected]>
wrote:
On Thu, 19 Aug 2010 09:06:32 -0500, G?khan Sever<[email protected]>
wrote:

On Thu, Aug 19, 2010 at 9:01 AM, greg whittier<[email protected]>  wrote:

I frequently deal with 3D data and would like to sum (or find the
mean, etc.) over the last two axes.  I.e. sum a[i,j,k] over j and k.
I find using .sum() really convenient for 2d arrays but end up
reshaping 2d arrays to do this.  I know there has to be a more
convenient way.  Here's what I'm doing

a = np.arange(27).reshape(3,3,3)

# sum over axis 1 and 2
result = a.reshape((a.shape[0], a.shape[1]*a.shape[2])).sum(axis=1)

Is there a cleaner way to do this?  I'm sure I'm missing something
obvious.

Thanks,
Greg

Using two sums

np.sum(np.sum(a, axis=-2), axis=1)
Be careful.  This works for sums, but not for operations like median;
the median of the row medians may not be the global median.  So, you
need to do the medians in one step.  I'm not aware of a method cleaner
than manually reshaping first.  There may also be speed reasons to do
things in one step.  But, two steps may look cleaner in code.
I think, two .sums() are the most accurate, if precision matters. One
big summation is often not very precise.

Josef


You can use dtype option in many functions like sum that allow a dtype with a higher precision to be used than the input dtype. It also helps with overflow as well such as summing integers as you don't have to convert the input dtype first. However, the value very much depends on your operating system notably windows platforms that don't support highest dtypes (so float128 is not going to help over float64).

Alternative use another approach to avoid loss of precision such as
Python's math.fsum()
http://docs.python.org/library/math.html

Or Recipe 393090: Binary floating point summation accurate to full precision:
http://code.activestate.com/recipes/393090/

Or Recipe 298339: More accurate sum (Python)
http://code.activestate.com/recipes/298339/

These are probably more accurate than first sorting the data from low to high and then summing from low to high.

Bruce
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to