Vineet, Thanks for your message. See my responses inline.
> On Sep 21, 2016, at 5:11 PM, Vineet Garg <vg...@hortonworks.com> wrote: > > Hello Julian/Calcite community, > > I am working on adding subquery support in HIVE using calcite. From what I > have read/understood so far Calcite requires HIVE to create RexSubqueryNode > corresponding to a subquery and then call SubQueryRemoveRule to get rid of > RexSubqueryNode and change it to join. This seems to be working for > Un-correlated queries where SubQueryRemoveRule creates Aggregate + Join to > get rid of RexSubqueryNode. But I am running into following issues with > Co-rrelated queries: (Note that I am using FILTER rule) > Looking at SubQueryRemoveRule code it should be creating Correlate node if it > finds any correlation in given filter. To find if given filter has > correlation getVariablesSet is called on filter, which supposedly should be > returning set of correlated variables, but it is always returning empty set > as filter does not implement this method. Shouldn’t Filter implement this > method to return appropriate correlated variables ? Yes. Remember it should return only the correlating variables it sets, not those it inherits. > Comments in SubQueryRemoveRule mentions that “The correlate can be removed > using RelDecorrelator”. But I don’t see SubqueryRemoveRule using > RelDecorrelator to de-correlate given query. Should SubQueryRemoveRule call > this ? If not is doing de-correlation immediately after SubQueryRemoveRule > appropriate ? I would tend to invoke RelDecorrelator on the whole tree. But I see no reason in principle why it can’t be called on a section of the tree, as long as that section is self-contained (i.e. no unbound correlating variables). > Here is what I have done so far for co-rrelated queries. Could you please > comment if this is right ? > While creating RexSubqueryNode and RelNode for the subquery I am creating > RexCorrelVariable. RexCorrelVariable needs a correlation id. CorrelationId > requires an integer id. Should this id be same as index of co-relatted column > in outer table ? No it shouldn’t necessarily. The id must be unique within the whole query. > Hive has a HiveFilter which is extended from Filter. I implemented > getVariableSet method to look at the condition and return all correlated > variables in condition’s RelNode. Does this sound correct ? Yes, sounds right. > I am calling RelDecorrelator’s decorrelateQuery immediately after calling > SubQueryRemoveRule. After implementing getVariableSet in HiveFilter > SubQueryRemoveRule seems to be creating appropriate LogicalCorrelate for > correlate queries but decorrelateQuery is throwing an exception. I can’t help too much if you are getting errors in Hive-land. This stuff is so complicated I strongly suggest unit tests. Don’t do anything “new” in Hive, make sure that it all works on Calcite logical nodes. Write tests in RelOptRulesTest. Julian