Hi Julian,

Thank you for your response. I have few follow-up questions:

Yes. Remember it should return only the correlating variables it sets, not 
those it inherits
What do you mean by inherit ? Could you kindly provide an example to elaborate?

No it shouldn’t necessarily. The id must be unique within the whole query.
If id is unique how does co-related variable in inner query is bound to outer 
query ? I.e. How would calcite figure out what variable in outer query a 
particular co-related variable refers to ?

Vineet

From: Julian Hyde <jh...@apache.org<mailto:jh...@apache.org>>
Date: Thursday, September 22, 2016 at 3:05 PM
To: default <vg...@hortonworks.com<mailto:vg...@hortonworks.com>>
Cc: "dev@calcite.apache.org<mailto:dev@calcite.apache.org>" 
<dev@calcite.apache.org<mailto:dev@calcite.apache.org>>
Subject: Re: Subquery de-correlation

Vineet,

Thanks for your message. See my responses inline.

On Sep 21, 2016, at 5:11 PM, Vineet Garg 
<vg...@hortonworks.com<mailto:vg...@hortonworks.com>> wrote:

Hello Julian/Calcite community,

I am working on adding subquery support in HIVE using calcite.  From what I 
have read/understood so far Calcite requires HIVE to create RexSubqueryNode 
corresponding to a subquery and then call SubQueryRemoveRule to get rid of 
RexSubqueryNode and change it to join. This seems to be working for 
Un-correlated queries where SubQueryRemoveRule creates Aggregate + Join to get 
rid of RexSubqueryNode. But I am running into following issues with Co-rrelated 
queries: (Note that I am using FILTER rule)

  *   Looking at SubQueryRemoveRule code it should be creating Correlate node 
if it finds any correlation in given filter. To find if given filter has 
correlation getVariablesSet is called on filter, which supposedly should be 
returning set of correlated variables, but it is always returning empty set as 
filter does not implement this method. Shouldn’t Filter implement this method 
to return appropriate correlated variables ?

Yes. Remember it should return only the correlating variables it sets, not 
those it inherits.

  *   Comments in SubQueryRemoveRule mentions that “The correlate can be 
removed using RelDecorrelator”. But I don’t see SubqueryRemoveRule using 
RelDecorrelator to de-correlate given query. Should SubQueryRemoveRule call 
this ? If not is doing de-correlation immediately after SubQueryRemoveRule 
appropriate ?

I would tend to invoke RelDecorrelator on the whole tree. But I see no reason 
in principle why it can’t be called on a section of the tree, as long as that 
section is self-contained (i.e. no unbound correlating variables).

Here is what I have done so far for co-rrelated queries. Could you please 
comment if this is right ?

  *   While creating RexSubqueryNode and RelNode for the subquery I am creating 
RexCorrelVariable. RexCorrelVariable needs a correlation id. CorrelationId 
requires an integer id. Should this id be same as index of co-relatted column 
in outer table ?

No it shouldn’t necessarily. The id must be unique within the whole query.

  *   Hive has a HiveFilter which is extended from Filter. I implemented 
getVariableSet method to look at the condition and return all correlated 
variables in condition’s RelNode. Does this sound correct ?

Yes, sounds right.

  *   I am calling RelDecorrelator’s decorrelateQuery immediately after calling 
SubQueryRemoveRule.  After implementing getVariableSet in HiveFilter 
SubQueryRemoveRule seems to be creating appropriate LogicalCorrelate for 
correlate queries but decorrelateQuery is throwing an exception.

I can’t help too much if you are getting errors in Hive-land. This stuff is so 
complicated I strongly suggest unit tests. Don’t do anything “new” in Hive, 
make sure that it all works on Calcite logical nodes. Write tests in 
RelOptRulesTest.

Julian

Reply via email to