Great! I'll commit then and see how it goes! Yes, you're right about .BY[[1]]. But `i.id1` was already there - in SDenv$.iSD part of the code.
Arun On Monday, November 11, 2013 at 4:53 PM, Eduard Antonyan wrote: > Everything looks good to me. Note that there is also .BY[[1]] that one can > potentially also want to use in those examples (which is basically same as > i.id1). > > > > On Mon, Nov 11, 2013 at 7:55 AM, Arunkumar Srinivasan <[email protected] > (mailto:[email protected])> wrote: > > Eddi, > > > > Thank you. However, I've realised something and made a slight change to the > > concept (at least I think that's the way to go). > > > > Basically, if you've: > > > > require(data.table) > > d1 <- data.table(id1=c(1L, 2L, 2L, 3L), val=1:4, key="id1") > > > > and you do: > > > > d1[, print(id1), by=id1] > > [1] 1 > > [1] 2 > > [1] 3 > > > > > > That is, while grouping, the grouping variables length for every group > > remains 1 (when grouping using "by"). for id=2, we don't get "2" two times. > > Going by the same logic, if we were to do: > > > > d1[J(2), id1] > > id1 id1 > > 1: 2 2 > > > > > > Here' the first "id1" is the grouping "id1" (from J(2)). Following FR #2693 > > from mnel, I've changed the names of J(.) when it has no names to resemble > > that of key columns of "d1". The second "id1" corresponds to the > > corresponding value of "id1" for "id1=2". And even though it's present 2 > > times, we print it only once. That is, it'll be identical to d1[, id1, > > by=id1], even though d1's "id1" is *not really* the grouping variable. > > > > If we've to refer to i's columns, then we've to explicitly state "i.id1". > > That is, here, it would be: > > > > d1[J(2), i.id1] # identical results, but i.id1 corresponds to data.table > > from J(2) with column name = id1 > > > > To illustrate the difference nicely, let's consider these data.tables: > > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1") > > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2") > > d3 <- copy(d2) > > setnames(d3, names(d1)) > > > > d1[d2, list(id1)] # what Gabor's post highlighted should work (but it > > doesn't give 1,2,2,NA as pointed out in the earlier post) > > id1 id1 > > 1: 1 1 > > 2: 2 2 > > > > 3: 4 NA > > > > > > d1[d3, list(id1, i.id1)] # id1 refers to values from d1 and i.id1 to d3. > > id1 id1 i.id1 > > 1: 1 1 1 > > 2: 2 2 2 > > 3: 4 NA 4 > > > > > > Note that for every (implicit) grouping value from d3, the only possible > > values for d1's grouping would be 1) identical to that of d3 or 2) NA. > > > > Let me know what you guys think. > > > > Arun > > > > > > On Monday, November 11, 2013 at 2:45 PM, Eduard Antonyan wrote: > > > > > I haven't checked yet what it does currently but what you wrote makes > > > perfect sense. > > > On Nov 10, 2013 5:44 AM, "Arunkumar Srinivasan" <[email protected] > > > (mailto:[email protected])> wrote: > > > > Hi everyone, > > > > > > > > To revive the discussion Gabor started here: > > > > http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html and > > > > the (related, but subtly different) FR mnel filed here: > > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978 > > > > > > > > > > > > Suppose you have: > > > > > > > > require(data.table) > > > > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1") > > > > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2") > > > > > > > > Then as Gabor points out: `d1[d2, id1]` should *not* result in an > > > > error, because FAQ 2.8 states (copied from Gabor's post linked above): > > > > > > > > 1. The scope of X's subset; i.e., X's column names. > > > > 2. The scope of each row of Y; i.e., Y's column names (join inherited > > > > scope) > > > > … > > > > > > > > In this case, the desired output for `d1[d2, id1]` should then be: > > > > id1 id1 > > > > 1: 1 1 > > > > 2: 2 2 > > > > 3: 2 2 > > > > 4: 4 NA > > > > > > > > > > > > That's what I at least understand from what the documentation intends. > > > > > > > > However, this recommends a subtle change to the current method of > > > > referring to columns, if we were to keep this idea. That is, consider > > > > the data.table "d3" as follows: > > > > > > > > d3 <- copy(d2) > > > > setnames(d3, names(d1)) > > > > > > > > Now, what should `d1[d3, id1]` give? The answer, I believe, is same as > > > > `d1[d2, id1]`. Why? Because, X's (here d1's) column names should be > > > > looked up first (as per FAQ 2.8). Therefore, corresponding to > > > > d2=c(1,2,4), the values for "id1" are c(1, (2,2), NA). Now, if the old > > > > behaviour is to be intended - here comes the "subtle change", then one > > > > should do: > > > > > > > > d1[d3, i.d1] # referring to i's variables with the "i." notation. > > > > > > > > I've managed to implement the first part where X's columns are looked > > > > up so that `d1[d2, id1]` doesn't result in error. However, I'd like to > > > > ensure that my understanding of the FAQ is right (and that the FAQ > > > > makes sense - it does to me). > > > > > > > > Please let me know what you all think so that I can implement the > > > > second part and commit. This, I believe will let us get a step closer > > > > to the consistency in DT syntax. > > > > > > > > Arun > > > > > > > > > > > > _______________________________________________ > > > > datatable-help mailing list > > > > [email protected] > > > > (mailto:[email protected]) > > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help > > >
_______________________________________________ datatable-help mailing list [email protected] https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
