Hi, I really like the DT1[DT2,z:=...] idiom. Unfortunately, the value of a merge() on other columns is a new data.table, so modifying DT1, like merge(DT1,DT2,by=...)[,z:=...], is not possible. Or is there actually a way to do this that I am missing?
If this syntax -- DT1[DT2,z:=...,by=c(key(DT1),x)] -- behaved differently, allowing the "by" to determine which columns were merged on, that would solve my issue, I guess. By the way, when you use by=c(key(DT),x), you get a speedup from DT's being keyed, right? *Some background* (rest of email): I've coded a value function iteration rather inefficiently and am looking into a few different directions for improving it. Efficiency matters because the result will enter a likelihood I need to maximize. (Value function iteration is solving a dynamic programming problem with discrete periods and a finite horizon from the horizon/final period backwards.) I was alternating between two data tables and doing things like DT1[DT2[y==t],z:=...] and DT2[DT1[y==t-1],q:=...], changing the keys on both tables before each merge-assign. If secondary keys were implemented<http://r-forge.r-project.org/tracker/index.php?func=detail&aid=1007&group_id=240&atid=978>, I'd have just gone with that. (Tom's secondary key method<http://lists.r-forge.r-project.org/pipermail/datatable-help/2010-May/000028.html>mentioned in the last link only works for subsetting, not merge-assigning, as far as I can tell.) I think I'm getting a big slow-down because I'm rekeying four times per iteration (two tables x two merge-assigns) and because I'm rekeying the entire table, when I'm only assigning to a subset. That second problem is easier to fix, I guess. Now I am considering making a single DT with key=intersect(key(DT1),key(DT2)) and using that instead, if I can figure out a way to do what I need to with it. Thanks, Frank
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
