On Apr 7, 2008, at 4:14 PM, LB wrote: > +1 for axis and +1 for a keyword to define what to do with values > outside the range. > > For the keyword, ather than 'outliers', I would propose 'discard' or > 'exclude', because it could be used to describe the four > possibilities : > - discard='low' => values lower than the range are discarded, > values higher are added to the last bin > - discard='up' => values higher than the range are discarded, > values lower are added to the first bin > - discard='out' => values out of the range are discarded > - discard=None => values outside of this range are allocated to > the closest bin > > For the default behavior, most of the case, the sum of the bins 's > population should be equal to the size of the original one for me, so > I would prefer discard=None. But I'm also okay with discard='low' in > order not to break older code, if this is clearly stated.
It seems that people in this discussion are forgetting that the bins are actually defined by the lower boundaries supplied, such that bins = [1,3,5] actually currently means bin1 -> 1 to 2.99999... bin2 -> 3 to 4.99999... bin3 -> 5 to inf (of course in version 1.0.1 the documentation is inconsistent with the behavior as described by the original poster). This definition of bins makes it hard to exclude values as it forces the user to give an extra value in the bin definition, i.e. the bins statement above only give two bins, while supplying three values. That seems confusing to me. I am not sure what the right approach is, but currently using range will clip the values outside the number the user wants. Cheers Tommy _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion