Hi Steve H, Please read "Describe the goal, not the step" here : http://www.catb.org/~esr/faqs/smart-questions.html
Matthew On Sat, 2011-05-07 at 01:50 -0400, Steve Harman wrote: > Thanks. > > > Here is the bigger picture. > There are about 2 million records. They need to be grouped using > person ID. > When we group them, we want to obtain a string where the grouped > values are sorted > and concatenated. > > > For example > > > ID, V1 > --- --- > 1, 2 > 1, 1 > 2, 8 > 2, 3 > 2, 5 > 2, 2 > > > should become > > > ID, Gr_V1 > --- ----- > 1, 1,2 > 2, 2,3,5,8 > > > The number of people is about 1,007 K > > > I am giving examples because (1) I cannot copy-paste code (2) data & > problem are classified > All of these computations are performed on secure machines > disconnected from the Internet. > Using R is not a requirement. Many databases can handle the above > using SQL. > However, these questions came up because I saw data.table while > browsing on the Internet > and thought I could give it a try in order to avoid using SQL. > > On Fri, May 6, 2011 at 6:37 PM, Matthew Dowle <[email protected]> > wrote: > Steve H, > How much is 'much better' and 'much longer' please? And on how > many > rows/GB? What is the bigger picture, and why are you > concatenating > strings together and using paste() at all? > Guess 1: you can include the x column in your key; e.g. > setkey(grp,x), > then there would be no need to sort(x) again. > Guess 2: sorting character can be slow. Hence we don't allow > character > columns in keys (yet); data.table converts character to > factor. > But, ideally, more information at a higher level would help us > to help. > Matthew > > > > On Fri, 2011-05-06 at 12:16 -0700, Steve Harman wrote: > > Connected to this RMySQL performs much better > > (using GROUP BY and functions such as GROUP_CONCAT which > allows you > > to > > order and use a separator too). > > > > So, I would recommend using them if you want grouping with > sorting. > > > > On May 6, 2:36 pm, Steve Harman <[email protected]> wrote: > > > Hello ! > > > When grouping using data.table, mean and sum functions > applied within > > > groups work well but if we use sort(x) function it takes > much longer. > > > > > > I would like to do first sort(x) and put it inside paste > such as > > > paste(sort(x),collapse=",") > > > I was wondering if there is any more efficient of > effective way of > > > doing this? > > > > > > thanks in advance, > > > > > > Steve > > > _______________________________________________ > > > datatable-help mailing list > > > > > [email protected]https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatabl... > > _______________________________________________ > > datatable-help mailing list > > [email protected] > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > > > > _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
