Matt,

Two things
1) I noticed that in Juliet's example, Y's first column is not named at all, so defaults to a name of "V1", yet the join carries through. How does the underlying implementation work here? I believe this was the original question, and I'm curious as well now. 2) Having re-read the paragraph you sent, I now realize that X[Y] is equivalent to X right join Y. It may be helpful to explicitly write this in. I certainly had this flipped (I thought it was X left join Y), and the column ordering is flipped from standard SQL.

On 6/7/2012 6:00 AM, [email protected] wrote:

----------------------------------------------------------------------

Message: 1
Date: Wed, 6 Jun 2012 13:47:00 -0400
From: Juliet Hannah<[email protected]>
To: [email protected]
Subject: [datatable-help] join example from faq
Message-ID:
        <CALzuZRQw2QaR=uAfPbmt_R8qnZEhLF7-=_zsoxahxsuhcdc...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

All,

I am not understanding a few basic things. I am looking at pg 5 of the faq.

X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
setkey(X,grp)
  Y = data.table(c("b","c"), bar=c(4,2))
  X[Y]

The faq says X[Y] is a join looking up X's rows using Y.

Does this mean data.table looks up X's key using Y?

Y has two columns. How does it know to use the first column in this
example? Y's key has not been set.

Hope my question is not too obvious. :)

Thanks,

Juliet

P.S. Thanks for the wonderful package.  I had to do some aggregations
the other day and my other solutions were
running for hours, but data.table finished in a couple of minutes!


------------------------------

Message: 2
Date: Wed, 06 Jun 2012 23:28:08 +0100
From: Matthew Dowle<[email protected]>
To: Juliet Hannah<[email protected]>
Cc: [email protected]
Subject: Re: [datatable-help] join example from faq
Message-ID:<1339021688.2573.16.camel@netbook>
Content-Type: text/plain; charset="UTF-8"

Hi,

Hopefully this paragraph from ?data.table sheds some light :

   "When i is a data.table, x must have a key. i is joined to x using the
key and the rows in x that match are returned. An equi-join is performed
between each column in i to each column in x's key. The match is a
binary search in compiled C in O(log n) time. If i has less columns than
x's key then many rows of x may match to each row of i. If i has more
columns than x's key, the columns of i not involved in the join are
included in the result. If i also has a key, it is i's key columns that
are used to match to x's key columns and a binary merge of the two
tables is carried out."

The critical sentence is "If i also has a key ..."; i.e., i doesn't have
to be keyed. Only x must have a key. It's often faster if i is keyed
too, though. There have been some speed improvements in 1.8.1, too.

In the example you highlighted I think it goes on to show 'join
inherited scope', which is this paragraph :

   "Advanced: In the X[Y,j] form of grouping, the j expression sees
variables in X first, then Y. We call this join inherited scope. If the
variable is not in X or Y then the calling frame is searched, its
calling frame, and so on in the usual way up to and including the global
environment."

Encouraging to hear you've reduced hours to minutes. At least someone
knows not to use benchmark(...,replications=100) for tasks that take
under 0.01 seconds and then conclude data.table is slow!

Matthew

On Wed, 2012-06-06 at 13:47 -0400, Juliet Hannah wrote:
All,

I am not understanding a few basic things. I am looking at pg 5 of the faq.

X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
setkey(X,grp)
  Y = data.table(c("b","c"), bar=c(4,2))
  X[Y]

The faq says X[Y] is a join looking up X's rows using Y.

Does this mean data.table looks up X's key using Y?

Y has two columns. How does it know to use the first column in this
example? Y's key has not been set.

Hope my question is not too obvious. :)

Thanks,

Juliet

P.S. Thanks for the wonderful package.  I had to do some aggregations
the other day and my other solutions were
running for hours, but data.table finished in a couple of minutes!



------------------------------

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

End of datatable-help Digest, Vol 28, Issue 2
*********************************************
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to