You don't necessarily have to use keys at all. When you aggregate and give the by columns, they don't necessarily have to be keys of the data table. This is called an "ad-hoc by". It is slightly slower, but my intuition says that it isn't really any slower than setting the key.
When you add a key you sort by those fields. You incur a time cost for that. If you are consistently doing things with those keys then you may make up for that time cost further on. But for multiple different groupings the ad-hoc by is probably faster. Do some timings to see. Some simple ones I did show that the act of sorting is slower than ad-hoc by. On 25 August 2011 11:05, Jean Jacques Dureau <[email protected]> wrote: > Hi, > i have a data.table (10,000k of rows) with 20 (factor) fields and i > need to filter data according some of them. > I use this data.table inside a function and i don't know "in advance" > wich fileds i'll use to filter data and to sum. > > So, for example, consider a data.table (named dt_data) with 20 fileds, > named f1, f2, ... ,f20. > > I use this approach: i set the key on the field i have to use, for > example f2. Then i "filter" the data and i use them to do some > computations. > > Subsequently, with these computations, i discover wich fileds i have > to filter, for example f4 and f5. Now, i set the key on dt_data on > (f4,f5), and so on ... > > I use this approach because i don't know if it's possible to set the > key on all fields f1, f2, .., f20 in advance and then use only some of > them! > > Is there a better way to use data.table? > > thanks > > jj > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
