On Thu, Aug 27, 2009 at 1:27 PM, <josef.p...@gmail.com> wrote: > On Thu, Aug 27, 2009 at 12:49 PM, Tim > Michelsen<timmichel...@gmx-topmail.de> wrote: >>> Tim, do you mean, that you want to apply other functions, e.g. mean or >>> variance, to the original values but calculated per bin? >> Sorry that I forgot to add this. Shame. >> >> I would like to apply these mathematical functions on the original values >> stacked in the respective bins. >> >> For instance: >> >> The sample data measures the wight of an animal. >> >> 1) historam give a count of how many values are in each bin. >> >> I would like to calculate the average wight of all animals >> sorted in bin1, bin2 etc. >> >> This is also useful in where you have a time component. >> >> In Spreadsheets I would use a '=' to reference to the original data and then >> either sum it up or count it per class. >> >> I hope this is somehow understandable. > > Yes, it is a quite common use case for descriptive statistics, and I'm > starting to collect different ways of doing it. > > In your case, Vincents way is the easiest. > > If you need to be faster, or you want to apply the same classification > also to other variables, e.g. size of the animal,.., then creating a > label array would be a more flexible solution. > > There was a similar thread recently on the scipy-user list for sorted > arrays: "How to average different pieces or an array?" > > Josef > >> >> Thanks, >> Timmie >
Here is a version where bincount and histogram produce the same results for mean and variance per bin if no bins are empty. If a bin is empty then either some nans or some small arbitrary numbers are returned. Josef # incompletely tested if a bin has zero elements, nans or missing in variance import numpy as np x = np.random.normal(size=100) #+ 1e5 # + 1e8 to compare precision c, b = np.histogram(x) sortind = np.argsort(x) reverse_sortind = np.argsort(sortind) xsorted = x[sortind] bind = np.searchsorted(xsorted,b,'right') #construct label index ind2 = np.zeros(x.shape, int) ind2[bind[1:-1]] = 1 # assumes boundary indices are included in y ind = ind2.cumsum() labels = ind[reverse_sortind] # reverse sorting print '\nmean' means = np.bincount(ind,xsorted)*1.0/np.bincount(ind) print means count = np.bincount(labels) means = np.bincount(labels,x)*1.0/count print means #compare mean with histogram countsPerBin = np.histogram(x)[0] sumsPerBin = np.histogram(x, weights=x)[0] averagePerBin = sumsPerBin / countsPerBin print averagePerBin print '\nvariance' meanarr = means[labels] var = np.bincount(labels,(x-meanarr)**2)/count print var # with histogram squaresums_perbin = np.histogram(x, weights=x**2)[0] var_perbin = squaresums_perbin*1.0 / countsPerBin - averagePerBin**2 print var_perbin print np.array(var) - np.array(var_perbin) _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion