Hi! I've been trying to implement a table and cross-table function for generic AbstractVectors and a more efficient version for PooledDataVectors (from DataArrays). I have something that seems to work fine for the latter, but the performance is not completely satisfying. See the code here: https://gist.github.com/nalimilan/8132114
Something like this: a = PooledDataArray(rep(1:10, 100000)) table(a) @time table(a) Reports about 1s here, while the same thing in R take about .4s. My implementation has the advantage that it does not copy the input vectors, which may have a great impact when working with large data under memory pressure. But I think I'm doing many things wrong, since the allocated bytes are much higher than I would expect/like. Ideally there wouldn't be any allocation in the inner loop. It seems that the main problem comes from the transformation from vector to varargs that happens in a[el...] += 1. In an ideal world the compiler would detect that the length of el is fixed for given input types, and it would be able to make it equivalent to a direct call. But maybe I'm not doing this correctly. Or would I be better off computing the linear index manually by combining the indexes on the different dimensions? A secondary issue is that += seems to involve a call to getindex() and another to setindex!(), while theoretically it would be possible to do both at the same time once the pointer to the array position has been computed. Is this a planned optimization? (For the general AbstractVector method, I need a similar feature but applied to Dicts, and I've seen that an update() method is apparently planned.) Thanks for your help (I plan to open a PR to discuss the interface soon)