Hi!

I've been trying to implement a table and cross-table function for
generic AbstractVectors and a more efficient version for
PooledDataVectors (from DataArrays). I have something that seems to work
fine for the latter, but the performance is not completely satisfying.
See the code here: https://gist.github.com/nalimilan/8132114

Something like this:
a = PooledDataArray(rep(1:10, 100000))
table(a)
@time table(a)

Reports about 1s here, while the same thing in R take about .4s. My
implementation has the advantage that it does not copy the input
vectors, which may have a great impact when working with large data
under memory pressure.

But I think I'm doing many things wrong, since the allocated bytes are
much higher than I would expect/like. Ideally there wouldn't be any
allocation in the inner loop. It seems that the main problem comes from
the transformation from vector to varargs that happens in a[el...] += 1.
In an ideal world the compiler would detect that the length of el is
fixed for given input types, and it would be able to make it equivalent
to a direct call. But maybe I'm not doing this correctly. Or would I be
better off computing the linear index manually by combining the indexes
on the different dimensions?

A secondary issue is that += seems to involve a call to getindex() and
another to setindex!(), while theoretically it would be possible to do
both at the same time once the pointer to the array position has been
computed. Is this a planned optimization? (For the general
AbstractVector method, I need a similar feature but applied to Dicts,
and I've seen that an update() method is apparently planned.)

Thanks for your help (I plan to open a PR to discuss the interface soon)

Reply via email to