> Is there a good rationale for attaching one to RelOptCluster Customized RelMetadataQuery with code generated meta handler for customized metadata, also can provide convenient way to get metadata.
- Haisheng ------------------------------------------------------------------ 发件人:Julian Hyde<jh...@apache.org> 日 期:2019年10月16日 06:52:39 收件人:<dev@calcite.apache.org> 主 题:Re: [DISCUSSION] Extension of Metadata Query Having reviewed https://issues.apache.org/jira/browse/CALCITE-3403 <https://issues.apache.org/jira/browse/CALCITE-3403> and https://issues.apache.org/jira/browse/CALCITE-2018 <https://issues.apache.org/jira/browse/CALCITE-2018> today I am worried about the direction RelMetadataQuery is taking. Is there a good rationale for attaching one to RelOptCluster? (Besides “it seems to work”?) The natural lifespan of a RelMetadataQuery is a RelOptCall (because the structure of the tree is constant for the duration of a call). As for sub-classing RelMetadataQuery, I really don’t know. It’s probably harmless, possibly disastrous, and it takes a lot of effort to determine which. We have to be really cautious with RelMetadataQuery because it is one of the few components in Calcite that mutate. Julian > On Oct 15, 2019, at 1:42 PM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote: > > I would like to bring this discussion up again. > > We created our own metadata provider, and set it to RelMetadataQuery by > calling > RelMetadataQuery.THREAD_PROVIDERS.set() > If we just use all the types of metadata provided by Calcite, it should be > fine. But given that we have created new types of metadata, and we want to > get them during rule transformation, we want to have the rule call return our > own RelMetadataQuery. In order to achieve that, we sub-classed > RelMetadataQuery, and sub-classed RelOptCluster to make metadata query > overridable, and return the instance of subclass of RelMetadataQuery when the > mq is null. Indeed, we can achieve that goal by just call rel.metadata(), but > we want a efficient and convenient way to do that. That is why we want > RelMetadataQuery in RelOptCluster customizable. > Is it a good practice? If not, any other better approach to suggest? > Thanks. > Haisheng > > ------------------------------------------------------------------ > 发件人:Stamatis Zampetakis<zabe...@gmail.com> > 日 期:2019年06月15日 15:47:58 > 收件人:<dev@calcite.apache.org> > 主 题:Re: [DISCUSSION] Extension of Metadata Query > > I had a look on CALCITE-1147 (which added support for custom providers and > metadata) and based on the added tests [1], it seems that the main > alternative to RelMetadataQuery is indeed through the MetadataFactory. One > other alternative I can think of, is to pass a custom RelMetadataQuery (or > a query factory) through the context of the planner [2]. > > Given that we have already methods to obtain a RelMetadataQuery in > RelOptRuleCall and RelOptCluster it may make sense to allow the type of > query to be pluggable. However, I have not really used this mechanism in > the past so I cannot be sure what's the best way to proceed. > > Best, > Stamatis > > [1] > https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L1001 > [2] > https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256 > > On Wed, Jun 12, 2019 at 6:07 AM Yuzhao Chen <yuzhao....@gmail.com> wrote: > >> We have an interface to fetch the RelMetadataQuery in RelOptRuleCall [1], >> which I believe is the most common routine to query metadata during >> planning, I’m a little confused for your saying >> >>> The methods in RelMetadataQuery are for convenience only. >> >> If these methods in RelMetadataQuery are only for convenience, what is the >> other way I can query a metadata ? The MedataFactory from the RelOptCluster >> ? (Users would never expect to use this because it’s based on Java >> Reflection and has performance problem) >> >> [1] >> https://github.com/apache/calcite/blob/a3c56be7bccc58859524ba39e5b30b7078f97d00/core/src/main/java/org/apache/calcite/plan/RelOptRuleCall.java#L200 >> >> Best, >> Danny Chan >> 在 2019年6月12日 +0800 AM6:27,Julian Hyde <jh...@apache.org>,写道: >>> The goal is that it should be possible to add a new kind of metadata. >> The methods in RelMetadataQuery are for convenience only. So you should be >> able to use your new kind of metadata, and existing ones, without modifying >> RelMetadataQuery. >>> >>> If that is not possible, it’s a bug, and you should log a JIRA case. >>> >>> Given how the methods re-generate handlers (by catching exceptions and >> modifying mutable private members) I can see that new kinds of metadata >> might be difficult to add. >>> >>> Julian >>> >>> >>>> On Jun 9, 2019, at 7:54 PM, Yuzhao Chen <yuzhao....@gmail.com> wrote: >>>> >>>> Thanks, Stamatis >>>> >>>> You are right, this discussion came up due to CALCITE-2885, it is not >> about the performance problem, it is the extension of RelMetadataQuery, >> because we add all kinds of top interfaces in RelMetadataQuery, e.g. the >> row count query [1]. >>>> >>>> When we add a new RelMetadataProvider, a corresponding >> interface/method should be added into RelMetadataQuery, but for current >> RelOpeCluster impl, we could not do that(with a always default instance) >>>> >>>> As for the methods in RelMetadataProvider, i only saw the usage of >> #handlers in RelMetadataQuery, and saw the reference you pase, it seems not >> that relevant with this topic. What do you think ? >>>> >>>> [1] >> https://github.com/apache/calcite/blob/941cd4e9540e3ef9b7c15daee42831a0c63da8b9/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L222 >>>> >>>> Best, >>>> Danny Chan >>>> 在 2019年6月5日 +0800 AM6:44,Stamatis Zampetakis <zabe...@gmail.com>,写道: >>>>> Thanks for bringing this up Danny. >>>>> >>>>> I guess the discussion came up due to CALCITE-2885 [1]. Looking back >> into >>>>> it, I am not sure that the intention there is to make the >> RelMetadataQuery >>>>> pluggable. We could possibly solve the performance problem without >>>>> extending the RelMetadataQuery. We have to look again into this case. >>>>> >>>>> For more details regarding the existence of the two methods in >>>>> RelMetadataProvider have a look in CALCITE-604 [2]. More general for >> the >>>>> design of RelMetadataProvider you may find useful the description in >> [3]. >>>>> >>>>> Best, >>>>> Stamatis >>>>> >>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2885 >>>>> [2] https://issues.apache.org/jira/browse/CALCITE-604 >>>>> [3] >>>>> >> https://web.archive.org/web/20140624040836/www.hydromatic.net/wiki/RelationalExpressionMetadata/ >>>>> >>>>> On Tue, Jun 4, 2019 at 7:48 PM Julian Hyde <jh...@apache.org> wrote: >>>>> >>>>>>> 1. Why we have 2 methods in RelMetadataProvider? >>>>>> >>>>>> The metadata system is complicated. We need to allow multiple >> handlers >>>>>> for any given call. So, making a metadata call involves multiple >>>>>> dispatch [1] based on the metadata method being called, the type of >>>>>> RelNode, and the handlers that are registered. Also it needs to >> cache >>>>>> results, and detect cycles. And the dispatch needs to be >> efficient, so >>>>>> we generate janino code to do the dispatch, and re-generate when >> new >>>>>> handlers or sub-classes of RelNode are added. >>>>>> >>>>>> I forget details, the two methods are basically required to allow >> us >>>>>> to generate code to do the dispatch. >>>>>> >>>>>>> 2. We should make the RelMetadataQuery in RelOptCluster >> pluggable. >>>>>> >>>>>> I disagree. RelMetadataQuery is not an extension point. Its sole >>>>>> purpose is to to keep state (the cache and cycle-checking). >>>>>> RelMetadataProvider is the extension point. (By analogy, if you are >>>>>> un-parsing an AST, you let each AST sub-class handle unparsing >> itself, >>>>>> but the unparsed text goes to a simple StringBuilder. >> RelMetadataQuery >>>>>> is in the role of the StringBuilder. In a complex system, it is >> nice >>>>>> to keep some of the components simple, or at least keep them to >>>>>> prescribed roles.) >>>>>> >>>>>> Julian >>>>>> >>>>>> [1] https://en.wikipedia.org/wiki/Multiple_dispatch >>>>>> >>>>>> On Sun, Jun 2, 2019 at 11:19 PM Yuzhao Chen <yuzhao....@gmail.com> >> wrote: >>>>>>> >>>>>>> Currently we provide answer to metadata query through >>>>>> RelMetadataProvider [1], there are some sub-classes of it: >>>>>>> >>>>>>> RelMetadataProvider >>>>>>> | >>>>>>> |- VolcanoRelMetadataProvider >>>>>>> |- ChainedRelMetadataProvider/DefaultRelMetadataProvider >>>>>>> |- HepRelMetadataProvider >>>>>>> |- CachingRelMetadataProvider >>>>>>> |- ReflectiveRelMetadataProvider >>>>>>> |- JaninoRelMetadataProvider >>>>>>> >>>>>>> The RelMetadataProvider has two methods: #apply and #handlers, >> the >>>>>> #apply method seems a programming interface and there is a demo >> code how we >>>>>> can use it: >>>>>>> >>>>>>> RelMetadataProvider provider; >>>>>>> LogicalFilter filter; >>>>>>> RexNode predicate; >>>>>>> Function<RelNode, Metadata> function = >>>>>>> provider.apply(LogicalFilter.class, Selectivity.class}; >>>>>>> Selectivity selectivity = function.apply(filter); >>>>>>> Double d = selectivity.selectivity(predicate); >>>>>>> >>>>>>> But let's see our RelOptCluster's member variables[2], there are >>>>>> MetadataFactory and RelMetadataQuery which all can be used to >> query the >>>>>> metadata, for MetadataFactory, there is a default impl named >>>>>> MetadataFactoryImpl which will invoke RelMetadataProvider#apply >> internally, >>>>>> for RelMetadataQuery, it will invoke RelMetadataProvider#handlers >> (finally >>>>>> composed and codeden by JaninoRelMetadataProvider). >>>>>>> >>>>>>> In our planning phrase, we can invoke >> RelOptRuleCall#getMetadataQuery to >>>>>> get the MQ and query the metadata. >>>>>>> >>>>>>> For extension of metadata handlers, we can set our customized >>>>>> RelMetadataProvider in RelOptCluster[3]. But for RelMetadataQuery, >> we have >>>>>> no way to extend it now, because the RelOptCluster always has a >> singleton >>>>>> instance [4] which is only the default implementation. >>>>>>> >>>>>>> >>>>>>> My question is as follows: >>>>>>> >>>>>>> 1. Why we have 2 methods in RelMetadataProvider, and why we need >> the >>>>>> MetadataFactory and RelMetadataProvider#apply ? It seems that it's >> function >>>>>> is already been overriden by RelMetadataQuery(The difference is >> that >>>>>> MetadataFactory use Reflection and RelMetadataQuery use gened >> bytes code). >>>>>>> 2. We should make the RelMetadataQuery in RelOptCluster >> pluggable. >>>>>>> >>>>>>> >>>>>>> [1] >>>>>> >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataProvider.java#L38 >>>>>>> [2] >>>>>> >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L49 >>>>>>> [3] >>>>>> >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L135 >>>>>>> [4] >>>>>> >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L151 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Best, >>>>>>> Danny Chan >>>>>> >>> >> >