It hadn't occurred to me to use CJ(), so I'll tinker with that this evening and see if there are any gains to be made there. In theory it's highly parallelizable, and one of the posts Matthew points to in his comments (in the post you reference) shows a way that it can be done (using the old multicore library, so I'm not exactly sure how it maps to the parallel library). In my case though, the whole process appears to be memory bound rather than CPU bound. Since my machine is fairly optimal (i7-4770 with 4x8GB DDR3-1600), I just don't think it's going to get dramatically faster. That doesn't mean I won't try...
------- Nathaniel Graham [email protected] [email protected] On Tue, Sep 17, 2013 at 5:52 PM, Frank Erickson <[email protected]> wrote: > Maybe not ultrafast, but with nice syntax: > > CJ(i=iset,j=jset)[criterion(i,j)] > > I guess it should be parallelizable, but that wouldn't be with data.table, > if I understand this correctly: > http://stackoverflow.com/questions/14759905/data-table-and-parallel-computing > > > On Tue, Sep 17, 2013 at 5:42 PM, Nathaniel Graham <[email protected]>wrote: > >> Oops; I meant to reply to all, and then forgot after I discarded and >> rewrote my >> message a few times. I suspect (although I'm not absolutely certain) >> that if >> NULL or similar did the same thing as returning a 0-row data.table with >> the >> appropriate number of columns, some operations could be sped up a bit. >> In those cases, the data.table code wouldn't need to check the number and >> type of the columns returned. >> >> I suspect that unless someone knows a secret, ultrafast way to iterate >> through >> a list of all combinations of a set of items and return the subset of >> those that >> match some criteria, that I'm as close to optimal as I'm likely to get >> right now. >> >> >> ------- >> Nathaniel Graham >> [email protected] >> [email protected] >> >> >> On Tue, Sep 17, 2013 at 5:22 PM, Frank Erickson <[email protected]>wrote: >> >>> Well, rbindlist(list()) says "Null data.table" (though it doesn't pass >>> the is.null() test). Maybe someone else has an idea how to deal with the >>> no-results case. By the way, it's best to use "reply to all" to make sure >>> you reply to the mailing list, too; they should be able to see your message >>> quoted below, though. >>> >>> --Frank >>> >>> >>> On Tue, Sep 17, 2013 at 5:03 PM, Nathaniel Graham >>> <[email protected]>wrote: >>> >>>> Frank, >>>> >>>> Thanks. This seems to have done the trick, so long as I'm careful to >>>> check for >>>> zero-length lists and return data.table(i = integer(), j = integer()) >>>> in those >>>> cases. Essentially, I have to test every combination of i and j to see >>>> if it's >>>> "interesting" or not, and some groups have a lot of rows. At the >>>> moment I'm >>>> attacking some other low hanging fruit, like speeding up the comparisons >>>> I have to do. >>>> >>>> As a side note, it would be kind of nice if there was a simple way to >>>> clue >>>> data.table to the fact that there are no rows to return, like returning >>>> NULL >>>> or NA or similar. >>>> >>>> ------- >>>> Nathaniel Graham >>>> [email protected] >>>> [email protected] >>>> >>> >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
