JJ, Yes, Chris is spot on. keyed by should be faster when the size of each group is large; e.g., a 1 billion row data.table of 1,000 groups. See FAQ 3.3 for why. However in your example, ad hoc by does seem more appropriate. Matthew
On Thu, 2011-08-25 at 11:17 -0400, Chris Neff wrote: > You don't necessarily have to use keys at all. When you aggregate and > give the by columns, they don't necessarily have to be keys of the > data table. This is called an "ad-hoc by". It is slightly slower, but > my intuition says that it isn't really any slower than setting the > key. > > When you add a key you sort by those fields. You incur a time cost > for that. If you are consistently doing things with those keys then > you may make up for that time cost further on. But for multiple > different groupings the ad-hoc by is probably faster. Do some timings > to see. Some simple ones I did show that the act of sorting is slower > than ad-hoc by. > > On 25 August 2011 11:05, Jean Jacques Dureau <[email protected]> wrote: > > Hi, > > i have a data.table (10,000k of rows) with 20 (factor) fields and i > > need to filter data according some of them. > > I use this data.table inside a function and i don't know "in advance" > > wich fileds i'll use to filter data and to sum. > > > > So, for example, consider a data.table (named dt_data) with 20 fileds, > > named f1, f2, ... ,f20. > > > > I use this approach: i set the key on the field i have to use, for > > example f2. Then i "filter" the data and i use them to do some > > computations. > > > > Subsequently, with these computations, i discover wich fileds i have > > to filter, for example f4 and f5. Now, i set the key on dt_data on > > (f4,f5), and so on ... > > > > I use this approach because i don't know if it's possible to set the > > key on all fields f1, f2, .., f20 in advance and then use only some of > > them! > > > > Is there a better way to use data.table? > > > > thanks > > > > jj > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
