Re: [Numpy-discussion] summing over more than one axis

Bruce Southey Thu, 19 Aug 2010 14:53:15 -0700

 On 08/19/2010 04:20 PM, [email protected] wrote:

On Thu, Aug 19, 2010 at 4:03 PM, John Salvatier
<[email protected]>  wrote:

Precise in what sense? Numerical accuracy? If so, why is that?

I don't remember where I ran into this example, maybe integer
underflow (?) with addition.
NIST ANOVA test cases have some nasty badly scaled variables


but I have problems creating one, difference in 10th or higher digit

a = 1000000*np.random.randn(10000,1000)
a.sum()

-820034796.05545747

np.sort(a.ravel())[::-1].sum()

-820034795.87886333

np.sort(a.ravel()).sum()

-820034795.88172638

np.sort(a,0)[::-1].sum()

-820034795.82333243

np.sort(a,1)[::-1].sum()

-820034796.05559027

a.sum(-1).sum(-1)

-820034796.05551744

np.sort(a,1)[::-1].sum(-1).sum(-1)

-820034796.05543578

np.sort(a,0)[::-1].sum(-1).sum(-1)

-820034796.05590343

np.sort(a,1).sum(-1).sum(-1)

-820034796.05544424

am = a.mean()
am*a.size + np.sort(a-am,1).sum(-1).sum(-1)

-820034796.05554879

a.size * np.sort(a,1).mean(-1).mean(-1)

-820034796.05544722

badly scaled or badly sorted arrays don't add up well

but I'm not able to get worse than 10th or 11th decimal in some random
generated examples with size 10000x1000

Josef

On Thu, Aug 19, 2010 at 12:13 PM,<[email protected]>  wrote:

On Thu, Aug 19, 2010 at 11:29 AM, Joe Harrington<[email protected]>
wrote:

On Thu, 19 Aug 2010 09:06:32 -0500, G?khan Sever<[email protected]>
wrote:

On Thu, Aug 19, 2010 at 9:01 AM, greg whittier<[email protected]>  wrote:

I frequently deal with 3D data and would like to sum (or find the
mean, etc.) over the last two axes.  I.e. sum a[i,j,k] over j and k.
I find using .sum() really convenient for 2d arrays but end up
reshaping 2d arrays to do this.  I know there has to be a more
convenient way.  Here's what I'm doing

a = np.arange(27).reshape(3,3,3)

# sum over axis 1 and 2
result = a.reshape((a.shape[0], a.shape[1]*a.shape[2])).sum(axis=1)

Is there a cleaner way to do this?  I'm sure I'm missing something
obvious.

Thanks,
Greg

Using two sums

np.sum(np.sum(a, axis=-2), axis=1)

Be careful.  This works for sums, but not for operations like median;
the median of the row medians may not be the global median.  So, you
need to do the medians in one step.  I'm not aware of a method cleaner
than manually reshaping first.  There may also be speed reasons to do
things in one step.  But, two steps may look cleaner in code.

I think, two .sums() are the most accurate, if precision matters. One
big summation is often not very precise.

Josef

You can use dtype option in many functions like sum that allow a dtypewith a higher precision to be used than the input dtype. It also helpswith overflow as well such as summing integers as you don't have toconvert the input dtype first. However, the value very much depends onyour operating system notably windows platforms that don't supporthighest dtypes (so float128 is not going to help over float64).


Alternative use another approach to avoid loss of precision such as
Python's math.fsum()
http://docs.python.org/library/math.html

Or Recipe 393090: Binary floating point summation accurate to fullprecision:

http://code.activestate.com/recipes/393090/

Or Recipe 298339: More accurate sum (Python)
http://code.activestate.com/recipes/298339/

These are probably more accurate than first sorting the data from low tohigh and then summing from low to high.


Bruce

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] summing over more than one axis

Reply via email to