Re: [datatable-help] Revisiting scoping rules in "j" (reviving Gabor's post)

Arunkumar Srinivasan Mon, 11 Nov 2013 07:56:07 -0800

Great! I'll commit then and see how it goes!
Yes, you're right about .BY[[1]]. But `i.id1` was already there - in SDenv$.iSD 
part of the code.


Arun


On Monday, November 11, 2013 at 4:53 PM, Eduard Antonyan wrote:

> Everything looks good to me. Note that there is also .BY[[1]] that one can 
> potentially also want to use in those examples (which is basically same as 
> i.id1).
>  
>  
>  
> On Mon, Nov 11, 2013 at 7:55 AM, Arunkumar Srinivasan <[email protected] 
> (mailto:[email protected])> wrote:
> > Eddi,  
> >  
> > Thank you. However, I've realised something and made a slight change to the 
> > concept (at least I think that's the way to go).
> >  
> > Basically, if you've:  
> >  
> > require(data.table)
> > d1 <- data.table(id1=c(1L, 2L, 2L, 3L), val=1:4, key="id1")
> >  
> > and you do:
> >  
> > d1[, print(id1), by=id1]
> > [1] 1
> > [1] 2
> > [1] 3
> >  
> >  
> > That is, while grouping, the grouping variables length for every group 
> > remains 1 (when grouping using "by"). for id=2, we don't get "2" two times. 
> > Going by the same logic, if we were to do:  
> >  
> > d1[J(2), id1]
> >    id1 id1
> > 1:   2   2
> >  
> >  
> > Here' the first "id1" is the grouping "id1" (from J(2)). Following FR #2693 
> > from mnel, I've changed the names of J(.) when it has no names to resemble 
> > that of key columns of "d1". The second "id1" corresponds to the 
> > corresponding value of "id1" for "id1=2". And even though it's present 2 
> > times, we print it only once. That is, it'll be identical to d1[, id1, 
> > by=id1], even though d1's "id1" is *not really* the grouping variable.   
> >  
> > If we've to refer to i's columns, then we've to explicitly state "i.id1". 
> > That is, here, it would be:
> >  
> > d1[J(2), i.id1] # identical results, but i.id1 corresponds to data.table 
> > from J(2) with column name = id1  
> >  
> > To illustrate the difference nicely, let's consider these data.tables:
> > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")  
> > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")  
> > d3 <- copy(d2)
> > setnames(d3, names(d1))
> >  
> > d1[d2, list(id1)] # what Gabor's post highlighted should work (but it 
> > doesn't give 1,2,2,NA as pointed out in the earlier post)  
> >    id1 id1
> > 1:   1   1
> > 2:   2   2
> >  
> > 3:   4  NA
> >  
> >  
> > d1[d3, list(id1, i.id1)] # id1 refers to values from d1 and i.id1 to d3.
> >    id1 id1 i.id1
> > 1:   1   1     1
> > 2:   2   2     2
> > 3:   4  NA     4
> >  
> >  
> > Note that for every (implicit) grouping value from d3, the only possible 
> > values for d1's grouping would be 1) identical to that of d3 or 2) NA.  
> >  
> > Let me know what you guys think.  
> >  
> > Arun
> >  
> >  
> > On Monday, November 11, 2013 at 2:45 PM, Eduard Antonyan wrote:
> >  
> > > I haven't checked yet what it does currently but what you wrote makes 
> > > perfect sense.  
> > > On Nov 10, 2013 5:44 AM, "Arunkumar Srinivasan" <[email protected] 
> > > (mailto:[email protected])> wrote:
> > > > Hi everyone,  
> > > >  
> > > > To revive the discussion Gabor started here: 
> > > > http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html and 
> > > > the (related, but subtly different) FR mnel filed here: 
> > > > https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2693&group_id=240&atid=978
> > > >   
> > > >  
> > > > Suppose you have:
> > > >  
> > > > require(data.table)  
> > > > d1 <- data.table(id1 = c(1L, 2L, 2L, 3L), val = 1:4, key = "id1")  
> > > > d2 <- data.table(id2 = c(1L, 2L, 4L), val2 = c(11, 12, 14),key = "id2")
> > > >  
> > > > Then as Gabor points out: `d1[d2, id1]`  should *not* result in an 
> > > > error, because FAQ 2.8 states (copied from Gabor's post linked above):
> > > >  
> > > > 1. The scope of X's subset; i.e., X's column names.  
> > > > 2. The scope of each row of Y; i.e., Y's column names (join inherited 
> > > > scope)  
> > > > …
> > > >  
> > > > In this case, the desired output for `d1[d2, id1]` should then be:
> > > >    id1 id1
> > > > 1:   1   1
> > > > 2:   2   2
> > > > 3:   2   2
> > > > 4:   4  NA
> > > >  
> > > >  
> > > > That's what I at least understand from what the documentation intends.  
> > > >  
> > > > However, this recommends a subtle change to the current method of 
> > > > referring to columns, if we were to keep this idea. That is, consider 
> > > > the data.table "d3" as follows:  
> > > >  
> > > > d3 <- copy(d2)
> > > > setnames(d3, names(d1))
> > > >  
> > > > Now, what should `d1[d3, id1]` give? The answer, I believe, is same as 
> > > > `d1[d2, id1]`. Why? Because, X's (here d1's) column names should be 
> > > > looked up first (as per FAQ 2.8). Therefore, corresponding to 
> > > > d2=c(1,2,4), the values for "id1" are c(1, (2,2), NA). Now, if the old 
> > > > behaviour is to be intended - here comes the "subtle change", then one 
> > > > should do:  
> > > >  
> > > > d1[d3, i.d1] # referring to i's variables with the "i." notation.
> > > >  
> > > > I've managed to implement the first part where X's columns are looked 
> > > > up so that `d1[d2, id1]` doesn't result in error. However, I'd like to 
> > > > ensure that my understanding of the FAQ is right (and that the FAQ 
> > > > makes sense - it does to me).  
> > > >  
> > > > Please let me know what you all think so that I can implement the 
> > > > second part and commit. This, I believe will let us get a step closer 
> > > > to the consistency in DT syntax.
> > > >  
> > > > Arun  
> > > >  
> > > >  
> > > > _______________________________________________
> > > > datatable-help mailing list
> > > > [email protected] 
> > > > (mailto:[email protected])
> > > > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
> >  
>

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Revisiting scoping rules in "j" (reviving Gabor's post)

Reply via email to