I don't remember you asking this before!

How many rows does delay.dt have and how many groups?

> because setting them in aggregation is expensive:

I'm not sure this example is proof of that. On the contrary, the output shows that names are being dropped before grouping commences (they are reinstated after grouping), as is correct behaviour. All I can think is that the list() wrapper itself is adding overhead. That might show up as this 38% difference if there are a very large number of groups (lots of calls to j). In the case of a single aggregate, the list() wrapper could be optimized away. This would be a nice improvement I didn't think of before.

Does this theory fit with your experience? If my guess is correct, if you instead compare two queries where j has list() in both; e.g., list(sum(count),max(count)) -vs- list(s=sum(count), m=max(count)) then I don't think you'll see a speed difference.


On 11/09/13 22:35, Sam Steingold wrote:
I find myself using setnames(...,"V1","...") very often because setting
them in aggregation is expensive:

--8<---------------cut here---------------start------------->8---
delays.short <- delays.dt[,sum(count),by="delay"]
Finding groups (bysameorder=TRUE) ... done in 1.262secs. bysameorder=TRUE and 
o__ is length 0
Detected that j uses these columns: count
Optimization is on but j left unchanged as 'sum(count)'
Starting dogroups ... done dogroups in 8.612 secs
delays.short <- delays.dt[,list(count=sum(count)),by="delay"]
Finding groups (bysameorder=TRUE) ... done in 1.051secs. bysameorder=TRUE and 
o__ is length 0
Detected that j uses these columns: count
Optimization is on but j left unchanged as 'list(sum(count))'
Starting dogroups ... done dogroups in 11.918 secs
--8<---------------cut here---------------end--------------->8---

38% difference is a lot (3 seconds is not a big deal, but this is just a
toy dataset).

ISTR that I have asked this question before - is this still (data.table
1.8.10) the state of the art, or am I doing something stupid?

Thanks!


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to