[jira] [Commented] (CALCITE-794) Detect cycles when computing statistics

Julian Hyde (JIRA) Tue, 29 Dec 2015 23:40:31 -0800

    [ 
https://issues.apache.org/jira/browse/CALCITE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074737#comment-15074737
 ]


Julian Hyde commented on CALCITE-794:
-------------------------------------

[~jni], Thanks for reviewing. I didn't expect you to be able to compile/run 
against my branch; I just wanted a review to confirm that we are heading in a 
direction that would be useful to Drill, and you've done that.

I'll reply to your comments here.

bq. What's the reasoning that we use the min of rowCount from the Rels in a 
RelSubset?

I figured that if we'd found a "clever" algorithm (say if an implementation of 
a join has noticed that one of the inputs has distinct rows and therefore it 
can use a semi-join) then it would have both a lower estimated row count and a 
better estimate. But it's just a hunch, not backed up by any empirical evidence.

bq. Why do we still need this flag? I assume the patch CALCITE_794 would fix 
the problem.

Good point. I will log a follow-up JIRA case for all issues not fixed in 
CALCITE-794, and will rename the flag to match.

bq. If the logic to detect cycle belongs to RelMetadataQuery, does it make more 
sense to catch CyclicMetadataException in RelMetadataQuery, and return default 
value there?

You might be right. We aren't dealing with cyclic metadata very systematically 
yet. I think it will evolve as we learn more. Now we have the mechanism to 
detect cycles, it seems to be very easy to fix them when they arise. My guess 
is that the provider of the specific kind of metadata is in a better position 
to provide a sensible default than RelMetadataQuery, but I don't feel strongly 
about it. I'll look into your idea when I finalize this patch.

bq. Could you please add comments here to explain why the code is applied for 
these types of Rel nodes? What about other Rel nodes like Sort?

That method, {{public Boolean areColumnsUnique(RelSubset rel, RelMetadataQuery 
mq, ImmutableBitSet columns, boolean ignoreNulls)}}, is still somewhat of a 
hack, of a similar nature to the other problems with the 
{{Bug.CALCITE_794_FIXED}} flag. We found that if we went too deep into a 
RelSubset (and its descendants) then things blew up. There is still work to be 
done there.

> Detect cycles when computing statistics
> ---------------------------------------
>
>                 Key: CALCITE-794
>                 URL: https://issues.apache.org/jira/browse/CALCITE-794
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>             Fix For: next
>
>
> The graph of RelNodes is allowed to be cyclic. This causes problems when 
> evaluating certain metadata, for example RelMetataQuery.areColumnsUnique. 
> While computing the value for RelNode r, it might recurse through say a 
> Project and hit r again. This causes a stack overflow.
> We solve this by adding a map or set of active RelNodes. The map is stored 
> within RelMetadataQuery, which can now be instantiated, and its methods are 
> no longer static. The first call should instantiate a RelMetadataQuery, but 
> all subsequent calls for metadata (perhaps several kinds of metadata) will 
> use the same RelMetadataQuery instance, hence the same map.
> Also add a RelMetadataQuery argument to the static "handler" methods in 
> RelMdColumnUniqueness and similar classes.
> This is a breaking change for people who have written a metadata handler, and 
> might be subtle to detect, because the methods are invoked via reflection.
> For code that is just using RelMetadataQuery methods, the change is still 
> breaking, but the break points and remedy will be obvious: the methods are no 
> longer static, so they need to change  RelMetadataQuery.foo() to  
> RelMetadataQuery.instance().foo().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CALCITE-794) Detect cycles when computing statistics

Reply via email to