"Short, Tom" <[email protected]> writes: > This seems to work ("data" is different than before, so the balance and > count columns are different): > >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum), > + by = as.list(by.factors)] > iquarter fico.bucket balance count > [1,] 0 25 0.1427648 1.0449715 > [2,] 0 50 0.8598616 0.7946641 > [3,] 0 75 0.7799311 0.6733977 > [4,] 0 100 1.1240393 1.3415721 > [5,] 1 25 1.6179294 1.9870932 > [6,] 1 50 1.4562150 2.0651700 > [7,] 1 75 1.8457541 1.6337161 > [8,] 1 100 2.0330688 0.8113971
Using as.list works for me as well, thanks. I had to change my summary function to return NA_real_ rather than just plain NA, but once I did that, everything seems to work. I'm impressed. It looks to be about 10 times faster, all considered. The actual aggregation step is something like 40 times faster, but I have to do some extra work to get it into a format suitable for data.table. I would still prefer there to be a more "plain vanilla" interface to all this. I have no idea why using "as.list" works, and that makes me uncomfortable. Regards, Johann > > > >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] >> On Behalf Of Johann Hibschman >> Sent: Monday, August 30, 2010 16:03 >> To: [email protected] >> Subject: Re: [datatable-help] Programmatic by clauses >> >> "Short, Tom" <[email protected]> writes: >> >> > Johann, how about the following: >> > [snip example] >> >> That's a good example; thanks. >> >> > Here's a data.table version: >> > >> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum), >> > + by = lapply(aggregation.spec, function (f) f(data))] >> > iquarter fico.bucket balance count >> > [1,] 0 25 0.5506797 1.133675 >> > [2,] 0 50 1.5175908 0.854553 >> > [3,] 0 75 0.4627294 1.171430 >> > [4,] 0 100 0.8354870 1.083211 >> > [5,] 1 25 1.7311503 1.210178 >> > [6,] 1 50 2.2930775 1.974759 >> > [7,] 1 75 1.0477066 1.973119 >> > [8,] 1 100 1.4351321 1.501291 >> >> I hadn't understood .SD before; that's a very good thing to know. >> >> > I think the following should also work, but it doesn't. Note that I >> > didn't update to the very latest version of data.table, and I know >> > Matthew has changed some things that might already fix this. >> > >> > >> >> data[, lapply(.SD[, cols.to.sum, with = FALSE], sum), >> > + by = by.factors] >> > Error in `[.data.table`(data, , lapply(.SD[, cols.to.sum, with = >> > FALSE], >> > : >> > column or expression 1 of 'by' list is not internally >> type integer. >> > Do not quote column names. Example of correct use: >> > by=list(colA,month(colB),...). >> >> It still doesn't work. Unfortunately, if I want to have a >> drop-in replacement, I have to operate on the equivalent by.factors. >> >> I tried the following: >> >> dt.tmp <- cbind(data[, cols.to.sum, with=FALSE], >> data.table(by.factors)) >> dt.agg <- dt.tmp[, lapply(.SD, sum), by=paste(names(by.factor), >> collapse=",")] >> >> but I got: >> >> Error in `[.data.table`(dt.tmp, , lapply(.SD, sum.na), by = >> paste(names(by), : >> by must evaluate to list >> >> I tried >> >> by.names <- paste(names(by.factor), collapse=",") >> dt.agg <- dt.tmp[, lapply(.SD, sum), by=by.names] >> >> but I got the same error. Randomly wrapping things in eval >> or evalq didn't seem to work either. >> >> Is there any chance that we could get a "less magic" version >> of the data.table extract that doesn't do anything fancy? Or >> maybe a by.with=FALSE option? >> >> I periodically try data.table, but I always run into this >> wall where I waste a few hours trying to guess how to make >> extract do what I want it to and finally give up. It's >> frustrating, it seems as if only data.table were trying to be >> less clever, it would be very useful to me. >> >> >> - Johann >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/d > atatable-help >> _______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
