Well, I don't know what's faster, but I'll have to agree with Farrel's 
philosophy. In addition, I KNOW that if I were to look at the code for $i 
'trick' a few weeks later, I would have uneasy feelings about it. I would 
prefer that I can easily read my code-the less thought, the better.

Here are the two ways I would think about doing it. No idea of speed issues.

dt <- data.table(a=c('a','a','a','a','b','b','b','b'),
  b=c('a','b','a','b','a','b','b','a'),c=1:8,key=c('a','b'))

# method 1. Just include all possible values of the first key in J. To me, this 
is conceptually the simplest
dt[J(unique(a),"b")]

# method 2. Swap the keys, twice
setkey(dt,b,a)
dt<-dt[J("b")]
setkey(dt,a,b)


From: [email protected] 
[mailto:[email protected]] On Behalf Of Farrel 
Buchinsky
Sent: Thursday, January 19, 2012 8:59 PM
To: Steve Lianoglou
Cc: [email protected]
Subject: Re: [datatable-help] using J() to select for a value that is in 
something other than the first key

I do not know if that is how all indexes work. I am not really a card-carrying 
database manager or programmer. I just play one in my spare time. The price I 
pay is not remembering how to write syntax when I need to do something. That to 
 me, is a higher price than slow subsetting. If the syntax is not easy I would 
rather just use the traditional vector scan methods that one sees in 
conventional data.frame subset commands.

Notwithstanding my idiosyncratic needs, I thank you very much for your 
explanation.
Farrel



On Thu, Jan 19, 2012 at 18:26, Steve Lianoglou 
<[email protected]<mailto:[email protected]>> wrote:
On Thu, Jan 19, 2012 at 6:09 PM, Farrel Buchinsky 
<[email protected]<mailto:[email protected]>> wrote:
> Oy gevalt!.Am I correct to believe that the technique is rearranging the
> data.table so that J can accept the input as pertaining to a secondary key?
> That seems as if it is too much work for me and my computer. I will rather
> stick to the vector scan methods for now.
Not the entire data.table, just the key columns.

Depending on how many queries you're going to make against the 2nd key
only, the pay off for your troubles could be anywhere from zero to
mucho. Of course if you simply don't have the RAM to make the idx
data.table in the first place, then that's that.

That's how all indexes work though, no? In a database for instance, if
you have a compound key/index over two or more columns, the index will
only help queries that work any prefix (or whole) part of the key, and
not just any subset elements of it (as you want to do here), right?

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to