Peter Butterworth wrote: > in np.histogram the top-most bin edge is inclusive of the upper range > limit. As documented in the docstring (see below) this is actually the > expected behavior, but this can lead to some weird enough results: > > In [72]: x=[1, 2, 3, 4]; np.histogram(x, bins=3) > Out[72]: (array([1, 1, 2]), array([ 1., 2., 3., 4.])) > > Is there any way round this or an alternative implementation without > this issue ?
The way around it is what you've identified -- making sure your bins are right. But I think the current behavior is the way it "should" be. It keeps folks from inadvertently loosing stuff off the end -- the lower end is inclusive, so the upper end should be too. In the middle bins, one has to make an arbitrary cut-off, and put the values on the "line" somewhere. One thing to keep in mind is that, in general, histogram is designed for floating point numbers, not just integers -- counting integers can be accomplished other ways, if that's what you really want (see np.bincount). But back to your example: > In [72]: x=[1, 2, 3, 4]; np.histogram(x, bins=3) Why do you want only 3 bins here? using 4 gives you what you want. If you want more control, then it seems you really want to know how many of each of the values 1,2,3,4 there are. so you want 4 bins, each *centered* on the integers, so you might do: In [8]: np.histogram(x, bins=4, range=(0.5, 4.5)) Out[8]: (array([1, 1, 1, 1]), array([ 0.5, 1.5, 2.5, 3.5, 4.5])) or, if you want to be more explicit: In [14]: np.histogram(x, bins=np.linspace(0.5, 4.5, 5)) Out[14]: (array([1, 1, 1, 1]), array([ 0.5, 1.5, 2.5, 3.5, 4.5])) HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [email protected] _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
