I think the following will achieve the same as your merge's: setkey(g_ctpty, ctpty_head) setkey(links, ctpty_head) setkey(g_iss, iss_head) g_iss[data.table(g_ctpty[links], key = "iss_head")]
And in general, merge(X, Y, all.x = TRUE) is (more or less) equivalent to Y[X] On Thu, Jun 20, 2013 at 11:43 AM, Ivan Alves <[email protected]> wrote: > Many thanks to both Eduard and Frank on the issue of the needed key. One > aspect of the merging that is not clear is how to do 'inner' vs. 'outer' > 'joins' (like in SQL). Whereas it works with merge (using the all.x=TRUE > option), how is it done with data.table? In the improved example below > > g_ctpty = data.table(ctpty_head=c("a","b","c","d"), ctpty_cty=c("US","DE", > "JP","CN")) > g_iss = data.table(iss_head=c("a","b","c","d"), iss_cty=c("US","DE","JP", > "CN")) > links = data.table(ctpty_head=c("a","b","c"), iss_head=c("b","b","a")) > setkey(g_ctpty,ctpty_head) > setkey(g_iss,iss_head) > merge( > merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"), > g_iss, > all.x = TRUE, by = "iss_head" > ) > g_iss[g_ctpty[links]] # error > links[g_ctpty][g_iss] # still error > setkey(links,ctpty_head,iss_head) # keys are needed > links[g_ctpty][g_iss] # how to get inner join? > > How do it not include the last line in the link? (Again, it works with > merge). Many thanks. > > Ivan > > > On 18 Jun 2013, at 17:34, Eduard Antonyan <[email protected]> > wrote: > > Frank has already answered why you're getting the results you are. > > Re "how do I tell DT to do *separate matchings at each join*?": > > Currently you have to use 'merge' or 'data.table', see this - > https://r-forge.r-project.org/tracker/?func=detail&atid=978&aid=4675&group_id=240 > and > the SO link inside. > > > On Tue, Jun 18, 2013 at 10:08 AM, Ivan Alves <[email protected]> wrote: > >> Hi Frank, >> >> Many thanks for the thoughts. >> >> It is something that has to do with the keys, that is for sure (see >> below). A double DT join does not understand that it has to do one join by >> one variable and the other by another variable: the output simply has lines >> where the two keys are the same (ctpty_head==iss_head),which is of course >> not optimal. >> >> how do I tell DT to do *separate matchings at each join*? Setting setkey >> (links,ctpty_head,iss_head) before the join does not work either. >> >> > key(g_ctpty[links]) >> NULL >> > key(g_ctpty) >> [1] "ctpty_head" >> > key(g_iss) >> [1] "iss_head" >> > key(links) >> NULL >> >> Hi Arunkumar, >> >> An example would look like follows: >> >> g_ctpty = data.table(ctpty_head=c("a","b","c"), >> ctpty_cty=c("US","DE","JP")) >> g_iss = data.table(iss_head=c("a","b","c"), iss_cty=c("US","DE","JP")) >> links = data.table(ctpty_head=c("a","b","c"), iss_head=c("b","b","a")) >> >> merge( >> + merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"), >> + g_iss, >> + all.x = TRUE, by = "iss_head" >> + ) >> iss_head ctpty_head ctpty_cty iss_cty >> 1: a c JP US >> 2: b a US DE >> 3: b b DE DE >> > g_iss[g_ctpty[links]] >> iss_head iss_cty ctpty_cty iss_head.1 >> 1: a US US b >> 2: b DE DE b >> 3: c JP JP a >> > setkey(links,ctpty_head,iss_head) >> > g_iss[g_ctpty[links]] >> iss_head iss_cty ctpty_cty iss_head.1 >> 1: a US US b >> 2: b DE DE b >> 3: c JP JP a >> >> On 17 Jun 2013, at 14:36, Frank Erickson <[email protected]> wrote: >> >> I think that key(g_ctpty[links]) == key(g_ctpty) == "ctpty_head", which >> is used when you do your second merge with [, instead of "iss_head". You >> can check this by running key(g_ctpty[links]) . --Frank >> >> >> On Mon, Jun 17, 2013 at 1:57 AM, Arunkumar Srinivasan < >> [email protected]> wrote: >> >>> Since you have the data as well, why not provide it (a small part at >>> least with which your issue is reproducible)? Isn't it much easier than to >>> ask everyone who's willing to help to create a data and test your code? >>> >>> Arun >>> >>> On Monday, June 17, 2013 at 8:54 AM, Ivan Alves wrote: >>> >>> Dear all, >>> >>> I am not sure I understand the syntax for merging data.tables. I have >>> keyed the two 'satelite' tables from which I want to match information to >>> the main table 'links' >>> >>> g_ctpty <- gultimate[,list(ctpty_head,ctpty_cty)] >>> setkey(g_ctpty,ctpty_head) >>> g_iss <- gultimate[,list(iss_head,iss_cty)] >>> setkey(g_iss,iss_head) >>> >>> Why are the two below not equivalent? >>> >>> This works: >>> >>> data = merge( >>> merge(links, g_ctpty, all.x = TRUE, by = "ctpty_head"), >>> g_iss, >>> all.x = TRUE, by = "iss_head" >>> ), >>> >>> And this does not: >>> >>> data = g_iss[g_ctpty[links]], >>> >>> Any guidance would be appreciated. >>> Kind regards, >>> Ivan >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >>> >>> >>> _______________________________________________ >>> datatable-help mailing list >>> [email protected] >>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >>> >> >> >> >> _______________________________________________ >> datatable-help mailing list >> [email protected] >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help >> > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
