Thanks Matthew .. I'll update my local copy. -steve
On Mon, Jan 10, 2011 at 3:18 AM, Matthew Dowle <[email protected]> wrote: > Hi Steve, > Fixed now. > Thanks, Matthew > > On Thu, 2011-01-06 at 09:48 -0500, Steve Lianoglou wrote: >> Hi Matthew, >> >> On Thu, Jan 6, 2011 at 5:26 AM, Matthew Dowle <[email protected]> wrote: >> > >> > How about writing it this way. This way should invoke the incremental >> > binary >> > search for efficiency too, rather than a repeated binary search for each >> > by. >> > >> >> dt2[dt1[, .SD[1], by=list(name, place)],mult="all"] >> > name place length >> > [1,] a home 10 >> > [2,] a home 100 >> > >> >> dt2[dt3[, .SD[1], by=list(name, place)],mult="all"] >> > name place length >> > [1,] a home 10 >> > [2,] a home 100 >> > [3,] b work 20 >> >> While doing it this way works for this trivial case, I'm actually >> doing a fair bit of book keeping/computation in my j expression and >> returning a list of elements that you can't really get from simple >> joins and stuff. >> >> > But doing it your way should work too, so I'll add as a bug. >> >> Thanks. I'm currently working around this by adding a dummy row into >> my dt1 data.table, which I then remove after the `dogroups` stuff >> finishes. >> >> -steve >> >> > Another way to get the first row of each group is a fast self-join via i. >> > There was a thread on that some time ago when i was changed to be evaluated >> > within the frame of DT too. Something like : >> > >> > dt3[J(unique(name)), mult="first"] # first of each group >> > >> > HTH >> > Matthew >> > >> > >> > "Steve Lianoglou" <[email protected]> wrote in message >> > news:[email protected]... >> > Hi, >> > >> > I'm calculating some statistics over a large data.table via `dt[, >> > {somestuff}, by=list(key1,key2)]`. >> > Sometimes my dt data.table ends up only having one row, which results >> > in the following error: >> > >> > "Didn't allocate enough rows for result of first group." >> > >> > Here is a toy/trivial example. >> > >> > R> dt1 <- data.table(name='a', place='home', count=1, key='name,place') >> > R> dt2 <- data.table(name=c('a', 'a', 'a', 'b'), >> > place=c('home', 'work', 'home', 'work'), >> > length=c(10,20,100, 20), key='name,place') >> > >> > R> dt1[, list(length=dt2[J(.SD$name[1], .SD$place[1]), >> > mult='all']$length), by=list(name, place)] >> > Error in `[.data.table`(dt1, , list(length = dt2[J(.SD$name[1], >> > .SD$place[1]), : >> > Didn't allocate enough rows for result of first group. >> > >> > When my data.table has > 1 row, it works: >> > >> > R> dt3 <- data.table(name=c('a', 'b'), place=c('home', 'work'), >> > count=1:2, key='name,place') >> > R> dt3[, list(length=dt2[J(.SD$name[1], .SD$place[1]), >> > mult='all']$length), by=list(name, place)] >> > name place length >> > [1,] a home 10 >> > [2,] a home 100 >> > [3,] b work 20 >> > >> > I believe if the result of my {somestuff} expression only ever >> > returned one row, this bug wouldn't happen, but .... it doesn't just >> > do that :-) >> > >> > It looks like the fix is where the `byretn` value is calculated in the >> > `[.data.table` but that code is a somehow inscrutable at first glance >> > ... can anyone propose a quick fix? >> > >> > Thanks, >> > -steve >> > >> > -- >> > Steve Lianoglou >> > Graduate Student: Computational Systems Biology >> > | Memorial Sloan-Kettering Cancer Center >> > | Weill Medical College of Cornell University >> > Contact Info: http://cbio.mskcc.org/~lianos/contact >> > >> > >> > >> > _______________________________________________ >> > datatable-help mailing list >> > [email protected] >> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > >> >> >> > > > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
