Thanks. Here is the bigger picture. There are about 2 million records. They need to be grouped using person ID. When we group them, we want to obtain a string where the grouped values are sorted and concatenated.
For example ID, V1 --- --- 1, 2 1, 1 2, 8 2, 3 2, 5 2, 2 should become ID, Gr_V1 --- ----- 1, 1,2 2, 2,3,5,8 The number of people is about 1,007 K I am giving examples because (1) I cannot copy-paste code (2) data & problem are classified All of these computations are performed on secure machines disconnected from the Internet. Using R is not a requirement. Many databases can handle the above using SQL. However, these questions came up because I saw data.table while browsing on the Internet and thought I could give it a try in order to avoid using SQL. On Fri, May 6, 2011 at 6:37 PM, Matthew Dowle <[email protected]>wrote: > Steve H, > How much is 'much better' and 'much longer' please? And on how many > rows/GB? What is the bigger picture, and why are you concatenating > strings together and using paste() at all? > Guess 1: you can include the x column in your key; e.g. setkey(grp,x), > then there would be no need to sort(x) again. > Guess 2: sorting character can be slow. Hence we don't allow character > columns in keys (yet); data.table converts character to factor. > But, ideally, more information at a higher level would help us to help. > Matthew > > > On Fri, 2011-05-06 at 12:16 -0700, Steve Harman wrote: > > Connected to this RMySQL performs much better > > (using GROUP BY and functions such as GROUP_CONCAT which allows you > > to > > order and use a separator too). > > > > So, I would recommend using them if you want grouping with sorting. > > > > On May 6, 2:36 pm, Steve Harman <[email protected]> wrote: > > > Hello ! > > > When grouping using data.table, mean and sum functions applied within > > > groups work well but if we use sort(x) function it takes much longer. > > > > > > I would like to do first sort(x) and put it inside paste such as > > > paste(sort(x),collapse=",") > > > I was wondering if there is any more efficient of effective way of > > > doing this? > > > > > > thanks in advance, > > > > > > Steve > > > _______________________________________________ > > > datatable-help mailing list > > > [email protected]:// > lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl... > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
