Hi,

Here is something I noticed with digitize() that I guess would qualify as a small but annoying bug.

In [165]: x = rand(10); bin = linspace(x.min(), x.max(), 10); print x.min(); print bin[0]; digitize(x,bin)
0.0925030184144
0.0925030184144
Out[165]: array([2, 9, 5, 9, 6, 1, 1, 1, 4, 5])

In [166]: x = rand(10); bin = linspace(x.min(), x.max(), 10); print x.min(); print bin[0]; digitize(x,bin)
0.0209738428066
0.0209738428066
Out[166]: array([ 5,  2,  8,  3,  0,  8,  9,  6, 10,  9])

Sometimes, the smallest number in x is counted in the first bin, and sometimes, it is counted as an outlier (bin number = 0). Moreover, creating the bin with
bin = linspace(x.min()-eps, x.max(), 10) doesn't seem to solve the problem if eps is too small (ie 1./2**32). So basically, you can have

In [186]: x.min()>bin[0]
Out[186]: True
and yet digitize() considers x.min() as an outlier.

And to actually do something constructive, here is a docstring for digitize
"""Given an array of values and bin edges, digitize(values, bin_edges) returns the index of the bin each value fall into.

The first bin has index 1, and the last bin has the index n, where n is the number of bins.
Values smaller than the inferior edge are assigned index 0, while values larger than the superior edge are assigned index n+1.
"""

Cheers,

David
P.S. Many mails I send don't make it to the list. Is it gmail related ?
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Numpy-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Reply via email to