Well, rbindlist(list()) says "Null data.table" (though it doesn't pass the is.null() test). Maybe someone else has an idea how to deal with the no-results case. By the way, it's best to use "reply to all" to make sure you reply to the mailing list, too; they should be able to see your message quoted below, though.
--Frank On Tue, Sep 17, 2013 at 5:03 PM, Nathaniel Graham <[email protected]>wrote: > Frank, > > Thanks. This seems to have done the trick, so long as I'm careful to > check for > zero-length lists and return data.table(i = integer(), j = integer()) in > those > cases. Essentially, I have to test every combination of i and j to see if > it's > "interesting" or not, and some groups have a lot of rows. At the moment > I'm > attacking some other low hanging fruit, like speeding up the comparisons > I have to do. > > As a side note, it would be kind of nice if there was a simple way to clue > data.table to the fact that there are no rows to return, like returning > NULL > or NA or similar. > > ------- > Nathaniel Graham > [email protected] > [email protected] > > > On Tue, Sep 17, 2013 at 4:22 PM, Frank Erickson <[email protected]> wrote: > >> Hi, >> >> I guess you could put them into a list and then rbind at the end: >> >> indi <- list() >> k=1 >> indi[[k]] <- list(i=2L,j=6L); k <- k+1 >> indi[[k]] <- list(4L,5L); k <- k+1 >> rbindlist(indi) >> # i j >> # 1: 2 6 >> # 2: 4 5 >> >> For some reason, I couldn't get rbindlist to work unless the first item >> in indi had explicit names ("i" and "j"), but names aren't needed for later >> items. >> >> This should be better than dynamically growing with rbind each time, but >> there may be a faster way. If your criteria for selecting (i,j) can be >> written down, there's likely a much faster way than looping like this. >> >> Best, >> >> --Frank >> >> >> >> On Tue, Sep 17, 2013 at 2:13 PM, Nathaniel Graham <[email protected]>wrote: >> >>> I'm currently using a (moderately) complex function, call >>> if f(), as a j expression to analyze my data. The data itself >>> is about 1.2M rows, which I analyze by group. >>> A group may have as few as one row or as many as 10K. >>> The output from the function is a two-column data.table >>> where the rows are interesting (for my work) pairs of >>> observations--I have no idea how many pairs will be >>> interesting until the function runs, but in abstract it could >>> be every unique combination (so as many as 50M rows >>> of output for one call to f()). It is common, and not an >>> error, for groups to have no meaningful pairs to return. >>> >>> I've been using the following line to create the output for >>> f(): >>> >>> indices <- data.table(i = integer(), j = integer()) >>> >>> I then append to 'indices' any useful pairs using: >>> >>> indices <- rbind(indices, list(idx[i], idx[j])) >>> >>> This works, but is very, very slow, in part because I'm >>> using rbind(). I want to switch to using the built-in matrix, >>> because rbind() should be much faster for them. Using >>> the following line to create the matrix: >>> >>> indices <- matrix(nrow = 0, ncol = 2, dimnames = >>> list(c(NULL),c("i","j"))) >>> >>> results in the following error: >>> >>> Logical error. Type of column should have been checked by now >>> >>> Note that the values returned are always integers. Results are >>> coerced via: >>> >>> data.table(indices) >>> >>> before returning from f(). If I don't explicitly coerce, I get the >>> following error: >>> >>> j doesn't evaluate to the same number of columns for each group >>> >>> If someone could tell me what I'm doing wrong, or some other >>> equivalent way to noticeably speed up the whole process, I'd >>> be very grateful. >>> >>> >>> ------- >>> Nathaniel Graham >>> [email protected] >>> [email protected] >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> >> >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
