Thank you for your reply.

Yes, it's best to enforce a unique ordering with an additional key, as you 
said.  This works as expected, but seems to not be in line with what the help 
page says.

Example:
> DT = data.table(index1=c(1,2,2),index2=c(1,2,3),values=c("a","b","c"))
> key(DT) <- c("index1","index2")
> DT
     index1 index2 values
[1,]      1      1      a
[2,]      2      2      b
[3,]      2      3      c

> DT[J(1:3),roll=TRUE]
     index1 index2 values
[1,]      1      1      a
[2,]      2      2      b
[3,]      2      3      c
[4,]      3      3      c

The "rolling index" is index1 here.  Isn't index1 considered the first column 
of DT's key?  

In the help pages -- help(data.table) -- the following is said about the "roll" 
option:
Applies to the last column of x's key, which is generally a date but can be any 
ordered variable, with gaps. When roll=TRUE if i's row matches to all but the 
last column of x's key, and the value of the last column falls in a gap 
(including after the last observation for that group), the prevailing value in 
x is rolled forward.

-Alex



-----Original Message-----
From: Steve Lianoglou [mailto:[email protected]] 
Sent: Thursday, July 21, 2011 11:24 AM
To: Alexander Peterhansl
Cc: [email protected]
Subject: Re: [datatable-help] Setting key when resulting order of table is not 
unique

Hi,

On Thu, Jul 21, 2011 at 11:02 AM, Alexander Peterhansl 
<[email protected]> wrote:
> Dear Data Table Help List,
>
> I am using data.table version 1.6 (with R version 2.12.2, 64-bit on 
> Windows 7).  Suppose I have a table whose key does not give me a unique 
> ordering.
> Then the output of the "roll" option will be arbitrary (i.e., it will 
> depend on what one does between the two executions).  Is this something 
> noteworthy?
>
> Please see output of the following:
>
>> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")
>
>> DT[J(1:3),roll=TRUE]  # output 1
>
>         A  B
> [1,] 1 b1
> [2,] 2 b3
> [3,] 2 b2
> [4,] 3 b2
>
>> key(DT)="B"           # change keys to do other stuff...
>> key(DT)="A"           # get back to key A DT[J(1:3),roll=TRUE]  # 
>> output 2 does not match output 1
>         A  B
> [1,] 1 b1
> [2,] 2 b2
> [3,] 2 b3
> [4,] 3 b3
>
> (Also, as an aside, I get identical output in the two executions of 
> DT[J(1:3),roll=TRUE] when I start with the table DT =
> data.table(A=c(1,2,2),B=c("b1","b2","b3"),key="A") instead.)
>
> I'm sure there must also be other reverberations-beyond the effect on 
> the roll option.
>
> Any insight would be of interest.  Thank you.

I don't think it's all that surprising in this case.

The original "keying" on A does not take your B column into consideration here:

R> DT = data.table(A=c(1,2,2),B=c("b1","b3","b2"),key="A")

But then when you set the key on "B", of course "b2" will have to be rearranged 
to come before "b3".

After you set the key on your DT back to A, A itself is in order already 
(1,2,2) == (1,2,2) so no moving around happens. You should note that the 
reordering in data.table is "stable" (I'm 95% sure on that, Matthew can verify) 
so "ties" will appear in the same order as they did in the original input.

If it is important in your scenario that this doesn't change when you "roll", 
you can always set a compound key on DT prior to doing that
calculation:

R> key(DT) <- c('A', 'B')

Anyway you shake it, if you run your code, then set the key to just "B", then 
again to c("A", "B") to "roll" again, your results will be the same.

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University Contact Info: 
http://cbio.mskcc.org/~lianos/contact
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to