David Huard wrote: > Hi all, > > I'd like to poll the list to see what people want from numpy.histogram(), > since I'm currently writing a contender. > > My main complaints with the current version are: > 1. upper outliers are stored in the last bin, while lower outliers are not > counted at all, > 2. cannot use weights. > > The new histogram function is well under way (it address these issues and > adds an axis keyword), > but I want to know what is the preferred behavior regarding the function > output, and your > willingness to introduce a new behavior that will break some code. > > Given a number of bins N and range (min, max), histogram constructs > linearly spaced bin edges > b0 (out-of-range) | b1 | b2 | b3 | .... | bN | bN+1 out-of-range > and may return: > > A. H = array([N_b0, N_b1, ..., N_bN, N_bN+1]) > The out-of-range values are the first and last values of the array. The > returned array is hence N+2 > > B. H = array([N_b0 + N_b1, N_b2, ..., N_bN + N_bN+1]) > The lower and upper out-of-range values are added to the first and last > bin respectively. > > C. H = array([N_b1, ..., N_bN + N_bN+1]) > Current behavior: the upper out-of-range values are added to the last bin. > > D. H = array([N_b1, N_b2, ..., N_bN]), > Lower and upper out-of-range values are given after the histogram array. > > Ideally, the new function would not break the common usage: H = > histogram(x)[0], so this exclude A. B and C are not acceptable in my > opinion, so only D remains, with the downsize that the outliers are not > returned. A solution might be to add a keyword full_output=False, which > when set to True, returns the out-of-range values in a dictionnary. > > Also, the current function returns -> H, ledges > where ledges is the array of left bin edges (N). > I propose returning the complete array of edges (N+1), including the > rightmost edge. This is a little bit impractical for plotting, as the > edges array does not have the same length as the histogram array, but > allows the use of user-defined non-uniform bins. > > Opinions, suggestions ? > > David
I have my own histogram that might interest you. The core is modern c++, with boost::python wrapper. Out-of-bounds behavior is programmable. I'll send it to you if you are interested. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion