Rick White wrote:
Just so we don't get too smug about the speed, if I do this in IDL on the same machine it is 10 times faster (0.28 seconds instead of 4 seconds). I'm sure the IDL version uses the much faster approach of just sweeping through the array once, incrementing counts in the appropriate bins. It only handles equal-sized bins, so it is not as general as the numpy version -- but equal-sized bins is a very common case. I'd still like to see a C version of histogram (which I guess would need to be a ufunc) go into the core numpy.
Yes, this gets rid of the search, and indices can just be caluclated from offsets. I've attached a modified weaved histogram that takes this approach. Running the snippet below on my machine takes .118 sec for the evenly binned weave algorithm and 0.385 sec for Rick's algorithm on 5 million elements. That is close to 4x faster (but not 10x...), so there is indeed some speed to be gained for the common special case. I don't know if the code I wrote has a 2x gain left in it, but I've spent zero time optimizing it. I'd bet it can be improved substantially.

eric

### test_weave_even_histogram.py

from numpy import arange, product, sum, zeros, uint8
from numpy.random import randint

import weave_even_histogram

import time

shape = 1000,1000,5
size = product(shape)
data = randint(0,256,size).astype(uint8)
bins = arange(256+1)

print 'type:', data.dtype
print 'millions of elements:', size/1e6

bin_start = 0
bin_size = 1
bin_count = 256
t1 = time.clock()
res = weave_even_histogram.histogram(data, bin_start, bin_size, bin_count)
t2 = time.clock()
print 'sec (evenly spaced):', t2-t1, sum(res)
print res


                                        Rick
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


from numpy import array, zeros, asarray, sort, int32
from scipy import weave
from typed_array_converter import converters

def histogram(ary, bin_start, bin_size, bin_count):
    
    ary = asarray(ary)
    
    # Create an array to hold the histogram count results.
    results = zeros(bin_count,dtype=int32)
    
    # The C++ code that actually does the histogramming.    
    code = """
           PyArrayIterObject *iter = (PyArrayIterObject*)PyArray_IterNew(py_ary);
           
           while(iter->index < iter->size)
           {
           
               //////////////////////////////////////////////////////////
               // binary search
               //////////////////////////////////////////////////////////
               
               // This requires an update to weave 
               ary_data_type value = *((ary_data_type*)iter->dataptr);
               if (value>=bin_start)
               {
                   int bin_index = (int)((value-bin_start)/bin_size);
               
                   //////////////////////////////////////////////////////////
                   // Bin counter increment
                   //////////////////////////////////////////////////////////
    
                   // If the value was found, increment the counter for that bin.
                   if (bin_index < bin_count)
                   {
                       results[bin_index]++;
                   }    
                   PyArray_ITER_NEXT(iter);
               }    
           }
           """
    weave.inline(code, ['ary', 'bin_start', 'bin_size','bin_count', 'results'], 
                 type_converters=converters, 
                 compiler='gcc')
                 
    return results
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to