Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable; and if this is a minority opinion, I do hope that at least this gets documented

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote: To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable; and if this is a

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread alex
On Mon, Jul 28, 2014 at 8:46 AM, Sebastian Berg sebast...@sipsolutions.net wrote: On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote: To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Daπid
On 28 July 2014 14:46, Sebastian Berg sebast...@sipsolutions.net wrote: To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable;

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sturla Molden
On 28/07/14 15:21, alex wrote: Are you sure they always give different results? Notice that np.ones((N,2)).mean(0) np.ones((2,N)).mean(1) compute means of different axes on transposed arrays so these differences 'cancel out'. They will be if different algorithms are used.

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Fabien
On 28.07.2014 15:30, Daπid wrote: An example using float16 on Numpy 1.8.1 (I haven't seen diferences with float32): Why aren't there differences between float16 and float32 ? Could this be related to my earlier post in this thread where I mentioned summation problems occurring much earlier in

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: On 28/07/14 15:21, alex wrote: Are you sure they always give different results? Notice that np.ones((N,2)).mean(0) np.ones((2,N)).mean(1) compute means of different axes on transposed arrays so these differences 'cancel out'.

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 15:50 +0200, Fabien wrote: On 28.07.2014 15:30, Daπid wrote: An example using float16 on Numpy 1.8.1 (I haven't seen diferences with float32): Why aren't there differences between float16 and float32 ? float16 calculations are actually float32 calculations. If done

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
Sebastian: Those are good points. Indeed iteration order may already produce different results, even though the semantics of numpy suggest identical operations. Still, I feel this different behavior without any semantical clues is something to be minimized. Indeed copying might have large speed

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote: Sebastian: Those are good points. Indeed iteration order may already produce different results, even though the semantics of numpy suggest identical operations. Still, I feel this different behavior without any semantical clues is

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
I see, thanks for the clarification. Just for the sake of argument, since unfortunately I don't have the time to go dig in the guts of numpy myself: a design which always produces results of the same (high) accuracy, but only optimizes the common access patterns in a hacky way, and may be

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Julian Taylor
On 28.07.2014 23:32, Eelco Hoogendoorn wrote: I see, thanks for the clarification. Just for the sake of argument, since unfortunately I don't have the time to go dig in the guts of numpy myself: a design which always produces results of the same (high) accuracy, but only optimizes the common

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread josef.pktd
On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden sturla.mol...@gmail.com wrote: Robert Kern robert.k...@gmail.com wrote: It would presumably require a global threading.RLock for protecting the global state. We would use thread-local storage like we currently do with the np.errstate()

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Robert Kern
On Sun, Jul 27, 2014 at 7:04 AM, josef.p...@gmail.com wrote: On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden sturla.mol...@gmail.com wrote: Robert Kern robert.k...@gmail.com wrote: It would presumably require a global threading.RLock for protecting the global state. We would use

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread josef.pktd
On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Jul 27, 2014 at 7:04 AM, josef.p...@gmail.com wrote: On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden sturla.mol...@gmail.com wrote: Robert Kern robert.k...@gmail.com wrote: It would presumably

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Robert Kern
On Sun, Jul 27, 2014 at 9:56 AM, josef.p...@gmail.com wrote: On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern robert.k...@gmail.com wrote: On Sun, Jul 27, 2014 at 7:04 AM, josef.p...@gmail.com wrote: On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden sturla.mol...@gmail.com wrote: Robert

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread RayS
At 02:04 AM 7/27/2014, you wrote: You won't be able to do it by accident or omission or a lack of discipline. It's not a tempting public target like, say, np.seterr(). BTW, why not throw an overflow error in the large float32 sum() case? Is it too expensive to check while accumulating? - Ray

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Nathaniel Smith
On Sun, Jul 27, 2014 at 3:16 PM, RayS r...@blue-cove.com wrote: At 02:04 AM 7/27/2014, you wrote: You won't be able to do it by accident or omission or a lack of discipline. It's not a tempting public target like, say, np.seterr(). BTW, why not throw an overflow error in the large float32

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread RayS
Thanks for the clarification, but how is the numpy rounding directed? Round to nearest, ties to even? http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules Just curious, as I couldn't find a reference. - Ray At 07:44 AM 7/27/2014, you wrote: On Sun, Jul 27, 2014 at 3:16 PM, RayS

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-27 Thread Sturla Molden
Nathaniel Smith n...@pobox.com wrote: The problem here is that when summing up the values, the sum gets large enough that after rounding, x + 1 = x and the sum stops increasing. Interesting. That explains why the divide-and-conquer reduction is much more robust. Thanks :) Sturla

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
- From: Julian Taylor jtaylor.deb...@googlemail.com Sent: ‎26-‎7-‎2014 00:58 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 25.07.2014 23:51, Eelco Hoogendoorn wrote: Ray: I'm not working with Hubble

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sebastian Berg
On Fr, 2014-07-25 at 21:23 +0200, Eelco Hoogendoorn wrote: It need not be exactly representable as such; take the mean of [1, 1 +eps] for instance. Granted, there are at most two number in the range of the original dtype which are closest to the true mean; but im not sure that computing them

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Sebastian Berg sebast...@sipsolutions.net wrote: chose more stable algorithms for such statistical functions. The pairwise summation that is in master now is very awesome, but it is not secure enough in the sense that a new user will have difficulty understanding when he can be sure it is

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
I was wondering the same thing. Are there any known tradeoffs to this method of reduction? On Sat, Jul 26, 2014 at 12:39 PM, Sturla Molden sturla.mol...@gmail.com wrote: Sebastian Berg sebast...@sipsolutions.net wrote: chose more stable algorithms for such statistical functions. The

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Julian Taylor
On 26.07.2014 15:38, Eelco Hoogendoorn wrote: Why is it not always used? for 1d reduction the iterator blocks by 8192 elements even when no buffering is required. There is a TODO in the source to fix that by adding additional checks. Unfortunately nobody knows hat these additional tests would

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Benjamin Root
I could get behind the context manager approach. It would help keep backwards compatibility, while providing a very easy (and clean) way of consistently using the same reduction operation. Adding kwargs is just a road to hell. Cheers! Ben Root On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sebastian Berg
On Sa, 2014-07-26 at 15:38 +0200, Eelco Hoogendoorn wrote: I was wondering the same thing. Are there any known tradeoffs to this method of reduction? Yes, it is much more complicated and incompatible with naive ufuncs if you want your memory access to be optimized. And optimizing that is very

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
A context manager makes sense. I very much appreciate the time constraints and the effort put in this far, but if we can not make something work uniformly, I wonder if we should include it in the master at all. I don't have a problem with customizing algorithms where fp accuracy demands it; I

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Sebastian Berg sebast...@sipsolutions.net wrote: Yes, it is much more complicated and incompatible with naive ufuncs if you want your memory access to be optimized. And optimizing that is very much worth it speed wise... Why? Couldn't we just copy the data chunk-wise to a temporary buffer of

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Sturla Molden sturla.mol...@gmail.com wrote: Sebastian Berg sebast...@sipsolutions.net wrote: Yes, it is much more complicated and incompatible with naive ufuncs if you want your memory access to be optimized. And optimizing that is very much worth it speed wise... Why? Couldn't we just

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd
On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root ben.r...@ou.edu wrote: I could get behind the context manager approach. It would help keep backwards compatibility, while providing a very easy (and clean) way of consistently using the same reduction operation. Adding kwargs is just a road to

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Benjamin Root
That is one way of doing it, and probably the cleanest way. Or else you have to pass in the context object everywhere anyway. But I am not so concerned about that (we do that for other things as well). Bigger concerns would be nested contexts. For example, what if one of the scikit functions use

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread josef.pktd
On Sat, Jul 26, 2014 at 2:44 PM, Benjamin Root ben.r...@ou.edu wrote: That is one way of doing it, and probably the cleanest way. Or else you have to pass in the context object everywhere anyway. But I am not so concerned about that (we do that for other things as well). Bigger concerns would

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Benjamin Root ben.r...@ou.edu wrote: My other concern would be with multi-threaded code (which is where a global state would be bad). It would presumably require a global threading.RLock for protecting the global state. Sturla ___ NumPy-Discussion

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Eelco Hoogendoorn
Perhaps I in turn am missing something; but I would suppose that any algorithm that requires multiple passes over the data is off the table? Perhaps I am being a little old fashioned and performance oriented here, but to make the ultra-majority of use cases suffer a factor two performance penalty

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
josef.p...@gmail.com wrote: statsmodels still has avoided anything that smells like a global state that changes calculation. If global states are stored in a stack, as in OpenGL, it is not so bad. A context manager could push a state in __enter__ and pop the state in __exit__. This is actually

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sylvain Corlay
I completely agree with Eelco. I expect numpy.mean to do something simple and straightforward. If the naive method is not well suited for my data, I can deal with it and have my own ad hoc method. On Sat, Jul 26, 2014 at 3:19 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: Perhaps I in

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Robert Kern
On Sat, Jul 26, 2014 at 8:04 PM, Sturla Molden sturla.mol...@gmail.com wrote: Benjamin Root ben.r...@ou.edu wrote: My other concern would be with multi-threaded code (which is where a global state would be bad). It would presumably require a global threading.RLock for protecting the global

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-26 Thread Sturla Molden
Robert Kern robert.k...@gmail.com wrote: It would presumably require a global threading.RLock for protecting the global state. We would use thread-local storage like we currently do with the np.errstate() context manager. Each thread will have its own global state. That sounds like a

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
: ‎25-‎7-‎2014 00:10 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: This isn't a bug report, but rather a feature request. I'm not sure statement

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 01:22 AM 7/25/2014, you wrote: Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified precision, rather than obtain subtly or horribly broken

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Robert Kern
On Fri, Jul 25, 2014 at 3:11 PM, RayS r...@blue-cove.com wrote: At 01:22 AM 7/25/2014, you wrote: Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 07:22 AM 7/25/2014, you wrote: We were talking on this in the office, as we realized it does affect a couple of lines dealing with large arrays, including complex64. As I expect Python modules to work uniformly cross platform unless documented otherwise, to me that includes 32 vs 64 bit

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
: RayS r...@blue-cove.com Sent: ‎25-‎7-‎2014 19:56 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 07:22 AM 7/25/2014, you wrote: We were talking on this in the office, as we realized it does affect

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Alan G Isaac
On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: At the risk of repeating myself: explicit is better than implicit This sounds like an argument for renaming the `mean` function `naivemean` rather than `mean`. Whatever numpy names `mean`, shouldn't it implement an algorithm that produces the

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Nathaniel Smith
On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
It need not be exactly representable as such; take the mean of [1, 1+eps] for instance. Granted, there are at most two number in the range of the original dtype which are closest to the true mean; but im not sure that computing them exactly is a tractable problem for arbitrary input. Im not sure

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread josef.pktd
On Fri, Jul 25, 2014 at 4:25 PM, RayS r...@blue-cove.com wrote: At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Eelco Hoogendoorn
-cove.com Sent: ‎25-‎7-‎2014 23:26 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 11:29 AM 7/25/2014, you wrote: On Fri, Jul 25, 2014 at 5:56 PM, RayS r...@blue-cove.com wrote: The important point

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread Julian Taylor
). From: RayS mailto:r...@blue-cove.com Sent: ‎25-‎7-‎2014 23:26 To: Discussion of Numerical Python mailto:numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 11:29 AM 7/25/2014, you wrote: On Fri

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-25 Thread RayS
At 02:36 PM 7/25/2014, you wrote: But it doesn't compensate for users to be aware of the problems. I think the docstring and the description of the dtype argument is pretty clear. Most of the docs for the affected functions do not have a Note with the same warning as mean() - Ray

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-24 Thread Eelco Hoogendoorn
alan.is...@gmail.com Sent: ‎25-‎7-‎2014 00:10 To: Discussion of Numerical Python numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: This isn't a bug report, but rather a feature request. I'm