Hi Milan, Have you looked at the many table-like functions already in existence? We have xtabs, xtab and table already.
Would be nice to shrink everything down to one high-performance function. -- John On Dec 26, 2013, at 6:05 AM, Milan Bouchet-Valat <nalimi...@club.fr> wrote: > Hi! > > I've been trying to implement a table and cross-table function for generic > AbstractVectors and a more efficient version for PooledDataVectors (from > DataArrays). I have something that seems to work fine for the latter, but the > performance is not completely satisfying. See the code here: > https://gist.github.com/nalimilan/8132114 > > Something like this: > a = PooledDataArray(rep(1:10, 100000)) > table(a) > @time table(a) > > Reports about 1s here, while the same thing in R take about .4s. My > implementation has the advantage that it does not copy the input vectors, > which may have a great impact when working with large data under memory > pressure. > > But I think I'm doing many things wrong, since the allocated bytes are much > higher than I would expect/like. Ideally there wouldn't be any allocation in > the inner loop. It seems that the main problem comes from the transformation > from vector to varargs that happens in a[el...] += 1. In an ideal world the > compiler would detect that the length of el is fixed for given input types, > and it would be able to make it equivalent to a direct call. But maybe I'm > not doing this correctly. Or would I be better off computing the linear index > manually by combining the indexes on the different dimensions? > > A secondary issue is that += seems to involve a call to getindex() and > another to setindex!(), while theoretically it would be possible to do both > at the same time once the pointer to the array position has been computed. Is > this a planned optimization? (For the general AbstractVector method, I need a > similar feature but applied to Dicts, and I've seen that an update() method > is apparently planned.) > > Thanks for your help (I plan to open a PR to discuss the interface soon)