Vineet,

Thanks for your message. See my responses inline.

> On Sep 21, 2016, at 5:11 PM, Vineet Garg <vg...@hortonworks.com> wrote:
> 
> Hello Julian/Calcite community,
> 
> I am working on adding subquery support in HIVE using calcite.  From what I 
> have read/understood so far Calcite requires HIVE to create RexSubqueryNode 
> corresponding to a subquery and then call SubQueryRemoveRule to get rid of 
> RexSubqueryNode and change it to join. This seems to be working for 
> Un-correlated queries where SubQueryRemoveRule creates Aggregate + Join to 
> get rid of RexSubqueryNode. But I am running into following issues with 
> Co-rrelated queries: (Note that I am using FILTER rule)
> Looking at SubQueryRemoveRule code it should be creating Correlate node if it 
> finds any correlation in given filter. To find if given filter has 
> correlation getVariablesSet is called on filter, which supposedly should be 
> returning set of correlated variables, but it is always returning empty set 
> as filter does not implement this method. Shouldn’t Filter implement this 
> method to return appropriate correlated variables ?
Yes. Remember it should return only the correlating variables it sets, not 
those it inherits.
> Comments in SubQueryRemoveRule mentions that “The correlate can be removed 
> using RelDecorrelator”. But I don’t see SubqueryRemoveRule using 
> RelDecorrelator to de-correlate given query. Should SubQueryRemoveRule call 
> this ? If not is doing de-correlation immediately after SubQueryRemoveRule 
> appropriate ? 
I would tend to invoke RelDecorrelator on the whole tree. But I see no reason 
in principle why it can’t be called on a section of the tree, as long as that 
section is self-contained (i.e. no unbound correlating variables).

> Here is what I have done so far for co-rrelated queries. Could you please 
> comment if this is right ?
> While creating RexSubqueryNode and RelNode for the subquery I am creating 
> RexCorrelVariable. RexCorrelVariable needs a correlation id. CorrelationId 
> requires an integer id. Should this id be same as index of co-relatted column 
> in outer table ? 
No it shouldn’t necessarily. The id must be unique within the whole query.
> Hive has a HiveFilter which is extended from Filter. I implemented 
> getVariableSet method to look at the condition and return all correlated 
> variables in condition’s RelNode. Does this sound correct ? 
Yes, sounds right.
> I am calling RelDecorrelator’s decorrelateQuery immediately after calling 
> SubQueryRemoveRule.  After implementing getVariableSet in HiveFilter 
> SubQueryRemoveRule seems to be creating appropriate LogicalCorrelate for 
> correlate queries but decorrelateQuery is throwing an exception.
I can’t help too much if you are getting errors in Hive-land. This stuff is so 
complicated I strongly suggest unit tests. Don’t do anything “new” in Hive, 
make sure that it all works on Calcite logical nodes. Write tests in 
RelOptRulesTest.

Julian

Reply via email to