Everything looks good to me. Note that there is also .BY[[1]] that one can potentially also want to use in those examples (which is basically same as i.id1).
On Mon, Nov 11, 2013 at 7:55 AM, Arunkumar Srinivasan <[email protected] > wrote: > Eddi, > > Thank you. However, I've realised something and made a slight change to > the concept (at least I think that's the way to go). > > Basically, if you've: > > require(data.table) > d1 <- data.table(id1=c(1L, 2L, 2L, 3L), val=1:4, key="id1") > > and you do: > > d1[, print(id1), by=id1] > [1] 1 > [1] 2 > [1] 3 > > That is, while grouping, the grouping variables length for every group > remains 1 (when grouping using "by"). for id=2, we don't get "2" two times. > Going by the same logic, if we were to do: > > d1[J(2), id1] > id1 id1 > 1: 2 2 > > Here' the first "id1" is the grouping "id1" (from J(2)). Following FR > #2693 from mnel, I've changed the names of J(.) when it has no names to > resemble that of key columns of "d1". The second "id1" corresponds to the > corresponding value of "id1" for "id1=2". And even though it's present 2 > times, we print it only once. That is, it'll be identical to d1[, id1, > by=id1], even though d1's "id1" is *not really* the grouping variable. > > If we've to refer to i's columns, then we've to explicitly state "i.id1". > That is, here, it would be: > > d1[J(2), i.id1] # identical results, but i.id1 corresponds to data.table > from J(2) with column name = id1 > > To illustrate the difference nicely, let's consider these data.tables: > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1") > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2") > d3 <- copy(d2) > setnames(d3, names(d1)) > > d1[d2, list(id1)] # what Gabor's post highlighted should work (but it > doesn't give 1,2,2,NA as pointed out in the earlier post) > id1 id1 > 1: 1 1 > 2: 2 2 > 3: 4 NA > > d1[d3, list(id1, i.id1)] # id1 refers to values from d1 and i.id1 to d3. > id1 id1 i.id1 > 1: 1 1 1 > 2: 2 2 2 > 3: 4 NA 4 > > Note that for every (implicit) grouping value from d3, the only possible > values for d1's grouping would be 1) identical to that of d3 or 2) NA. > > Let me know what you guys think. > > Arun > > On Monday, November 11, 2013 at 2:45 PM, Eduard Antonyan wrote: > > I haven't checked yet what it does currently but what you wrote makes > perfect sense. > On Nov 10, 2013 5:44 AM, "Arunkumar Srinivasan" <[email protected]> > wrote: > > Hi everyone, > > To revive the discussion Gabor started here: > http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html and the > (related, but subtly different) FR mnel filed here: > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978 > > Suppose you have: > > require(data.table) > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1") > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2") > > Then as Gabor points out: `d1[d2, id1]` should *not* result in an error, > because FAQ 2.8 states (copied from Gabor's post linked above): > > 1. The scope of X's subset; i.e., X's column names. > 2. The scope of each row of Y; i.e., Y's column names (join inherited > scope) > … > > In this case, the desired output for `d1[d2, id1]` should then be: > id1 id1 > 1: 1 1 > 2: 2 2 > 3: 2 2 > 4: 4 NA > > That's what I at least understand from what the documentation intends. > > However, this recommends a subtle change to the current method of > referring to columns, if we were to keep this idea. That is, consider the > data.table "d3" as follows: > > d3 <- copy(d2) > setnames(d3, names(d1)) > > Now, what should `d1[d3, id1]` give? The answer, I believe, is same as > `d1[d2, id1]`. Why? Because, X's (here d1's) column names should be looked > up first (as per FAQ 2.8). Therefore, corresponding to d2=c(1,2,4), the > values for "id1" are c(1, (2,2), NA). Now, if the old behaviour is to be > intended - here comes the "subtle change", then one should do: > > d1[d3, i.d1] # referring to i's variables with the "i." notation. > > I've managed to implement the first part where X's columns are looked up > so that `d1[d2, id1]` doesn't result in error. However, I'd like to ensure > that my understanding of the FAQ is right (and that the FAQ makes sense - > it does to me). > > Please let me know what you all think so that I can implement the second > part and commit. This, I believe will let us get a step closer to the > consistency in DT syntax. > > Arun > > > _______________________________________________ > datatable-help mailing list > [email protected] > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
