Re: [datatable-help] Revisiting scoping rules in "j" (reviving Gabor's post)

Arunkumar Srinivasan Mon, 11 Nov 2013 05:56:20 -0800

Eddi,  

Thank you. However, I've realised something and made a slight change to the 
concept (at least I think that's the way to go).


Basically, if you've:

require(data.table)
d1 <- data.table(id1=c(1L, 2L, 2L, 3L), val=1:4, key="id1")

and you do:

d1[, print(id1), by=id1]
[1] 1
[1] 2
[1] 3


That is, while grouping, the grouping variables length for every group remains 
1 (when grouping using "by"). for id=2, we don't get "2" two times. Going by 
the same logic, if we were to do:

d1[J(2), id1]
   id1 id1
1:   2   2


Here' the first "id1" is the grouping "id1" (from J(2)). Following FR #2693 
from mnel, I've changed the names of J(.) when it has no names to resemble that 
of key columns of "d1". The second "id1" corresponds to the corresponding value 
of "id1" for "id1=2". And even though it's present 2 times, we print it only 
once. That is, it'll be identical to d1[, id1, by=id1], even though d1's "id1" 
is *not really* the grouping variable.  

If we've to refer to i's columns, then we've to explicitly state "i.id1". That 
is, here, it would be:

d1[J(2), i.id1] # identical results, but i.id1 corresponds to data.table from 
J(2) with column name = id1

To illustrate the difference nicely, let's consider these data.tables:
d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")  
d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")  
d3 <- copy(d2)
setnames(d3, names(d1))

d1[d2, list(id1)] # what Gabor's post highlighted should work (but it doesn't 
give 1,2,2,NA as pointed out in the earlier post)
   id1 id1
1:   1   1
2:   2   2
3:   4  NA


d1[d3, list(id1, i.id1)] # id1 refers to values from d1 and i.id1 to d3.
   id1 id1 i.id1
1:   1   1     1
2:   2   2     2
3:   4  NA     4


Note that for every (implicit) grouping value from d3, the only possible values 
for d1's grouping would be 1) identical to that of d3 or 2) NA.

Let me know what you guys think.  

Arun


On Monday, November 11, 2013 at 2:45 PM, Eduard Antonyan wrote:

> I haven't checked yet what it does currently but what you wrote makes perfect 
> sense.  
> On Nov 10, 2013 5:44 AM, "Arunkumar Srinivasan" <[email protected] 
> (mailto:[email protected])> wrote:
> > Hi everyone,  
> >  
> > To revive the discussion Gabor started here: 
> > http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html and the 
> > (related, but subtly different) FR mnel filed here: 
> > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978
> >   
> >  
> > Suppose you have:
> >  
> > require(data.table)  
> > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")  
> > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")
> >  
> > Then as Gabor points out: `d1[d2, id1]`  should *not* result in an error, 
> > because FAQ 2.8 states (copied from Gabor's post linked above):
> >  
> > 1. The scope of X's subset; i.e., X's column names.  
> > 2. The scope of each row of Y; i.e., Y's column names (join inherited 
> > scope)  
> > …
> >  
> > In this case, the desired output for `d1[d2, id1]` should then be:
> >    id1 id1
> > 1:   1   1
> > 2:   2   2
> > 3:   2   2
> > 4:   4  NA
> >  
> >  
> > That's what I at least understand from what the documentation intends.  
> >  
> > However, this recommends a subtle change to the current method of referring 
> > to columns, if we were to keep this idea. That is, consider the data.table 
> > "d3" as follows:  
> >  
> > d3 <- copy(d2)
> > setnames(d3, names(d1))
> >  
> > Now, what should `d1[d3, id1]` give? The answer, I believe, is same as 
> > `d1[d2, id1]`. Why? Because, X's (here d1's) column names should be looked 
> > up first (as per FAQ 2.8). Therefore, corresponding to d2=c(1,2,4), the 
> > values for "id1" are c(1, (2,2), NA). Now, if the old behaviour is to be 
> > intended - here comes the "subtle change", then one should do:  
> >  
> > d1[d3, i.d1] # referring to i's variables with the "i." notation.
> >  
> > I've managed to implement the first part where X's columns are looked up so 
> > that `d1[d2, id1]` doesn't result in error. However, I'd like to ensure 
> > that my understanding of the FAQ is right (and that the FAQ makes sense - 
> > it does to me).  
> >  
> > Please let me know what you all think so that I can implement the second 
> > part and commit. This, I believe will let us get a step closer to the 
> > consistency in DT syntax.
> >  
> > Arun  
> >  
> >  
> > _______________________________________________
> > datatable-help mailing list
> > [email protected] 
> > (mailto:[email protected])
> > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Revisiting scoping rules in "j" (reviving Gabor's post)

Reply via email to