Hi Milan,

Have you looked at the many table-like functions already in existence? We have 
xtabs, xtab and table already.

Would be nice to shrink everything down to one high-performance function.

 -- John

On Dec 26, 2013, at 6:05 AM, Milan Bouchet-Valat <nalimi...@club.fr> wrote:

> Hi!
> 
> I've been trying to implement a table and cross-table function for generic 
> AbstractVectors and a more efficient version for PooledDataVectors (from 
> DataArrays). I have something that seems to work fine for the latter, but the 
> performance is not completely satisfying. See the code here: 
> https://gist.github.com/nalimilan/8132114
> 
> Something like this:
> a = PooledDataArray(rep(1:10, 100000))
> table(a)
> @time table(a)
> 
> Reports about 1s here, while the same thing in R take about .4s. My 
> implementation has the advantage that it does not copy the input vectors, 
> which may have a great impact when working with large data under memory 
> pressure.
> 
> But I think I'm doing many things wrong, since the allocated bytes are much 
> higher than I would expect/like. Ideally there wouldn't be any allocation in 
> the inner loop. It seems that the main problem comes from the transformation 
> from vector to varargs that happens in a[el...] += 1. In an ideal world the 
> compiler would detect that the length of el is fixed for given input types, 
> and it would be able to make it equivalent to a direct call. But maybe I'm 
> not doing this correctly. Or would I be better off computing the linear index 
> manually by combining the indexes on the different dimensions?
> 
> A secondary issue is that += seems to involve a call to getindex() and 
> another to setindex!(), while theoretically it would be possible to do both 
> at the same time once the pointer to the array position has been computed. Is 
> this a planned optimization? (For the general AbstractVector method, I need a 
> similar feature but applied to Dicts, and I've seen that an update() method 
> is apparently planned.)
> 
> Thanks for your help (I plan to open a PR to discuss the interface soon)

Reply via email to