Yes, but why RelOptCluster? Besides the fact that there happened to be a method there.
> On Oct 15, 2019, at 7:30 PM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote: > > > Is there a good rationale for attaching one to RelOptCluster > Customized RelMetadataQuery with code generated meta handler for customized > metadata, also can provide convenient way to get metadata. > > - Haisheng > > ------------------------------------------------------------------ > 发件人:Julian Hyde<jh...@apache.org> > 日 期:2019年10月16日 06:52:39 > 收件人:<dev@calcite.apache.org> > 主 题:Re: [DISCUSSION] Extension of Metadata Query > > Having reviewed https://issues.apache.org/jira/browse/CALCITE-3403 > <https://issues.apache.org/jira/browse/CALCITE-3403> and > https://issues.apache.org/jira/browse/CALCITE-2018 > <https://issues.apache.org/jira/browse/CALCITE-2018> today I am worried about > the direction RelMetadataQuery is taking. Is there a good rationale for > attaching one to RelOptCluster? (Besides “it seems to work”?) > > The natural lifespan of a RelMetadataQuery is a RelOptCall (because the > structure of the tree is constant for the duration of a call). > > As for sub-classing RelMetadataQuery, I really don’t know. It’s probably > harmless, possibly disastrous, and it takes a lot of effort to determine > which. We have to be really cautious with RelMetadataQuery because it is one > of the few components in Calcite that mutate. > > Julian > > > > > > On Oct 15, 2019, at 1:42 PM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote: > > > > I would like to bring this discussion up again. > > > > We created our own metadata provider, and set it to RelMetadataQuery by > > calling > > RelMetadataQuery.THREAD_PROVIDERS.set() > > If we just use all the types of metadata provided by Calcite, it should be > > fine. But given that we have created new types of metadata, and we want to > > get them during rule transformation, we want to have the rule call return > > our own RelMetadataQuery. In order to achieve that, we sub-classed > > RelMetadataQuery, and sub-classed RelOptCluster to make metadata query > > overridable, and return the instance of subclass of RelMetadataQuery when > > the mq is null. Indeed, we can achieve that goal by just call > > rel.metadata(), but we want a efficient and convenient way to do that. That > > is why we want RelMetadataQuery in RelOptCluster customizable. > > Is it a good practice? If not, any other better approach to suggest? > > Thanks. > > Haisheng > > > > ------------------------------------------------------------------ > > 发件人:Stamatis Zampetakis<zabe...@gmail.com> > > 日 期:2019年06月15日 15:47:58 > > 收件人:<dev@calcite.apache.org> > > 主 题:Re: [DISCUSSION] Extension of Metadata Query > > > > I had a look on CALCITE-1147 (which added support for custom providers and > > metadata) and based on the added tests [1], it seems that the main > > alternative to RelMetadataQuery is indeed through the MetadataFactory. One > > other alternative I can think of, is to pass a custom RelMetadataQuery (or > > a query factory) through the context of the planner [2]. > > > > Given that we have already methods to obtain a RelMetadataQuery in > > RelOptRuleCall and RelOptCluster it may make sense to allow the type of > > query to be pluggable. However, I have not really used this mechanism in > > the past so I cannot be sure what's the best way to proceed. > > > > Best, > > Stamatis > > > > [1] > > https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L1001 > > [2] > > https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256 > > > > On Wed, Jun 12, 2019 at 6:07 AM Yuzhao Chen <yuzhao....@gmail.com> wrote: > > > >> We have an interface to fetch the RelMetadataQuery in RelOptRuleCall [1], > >> which I believe is the most common routine to query metadata during > >> planning, I’m a little confused for your saying > >> > >>> The methods in RelMetadataQuery are for convenience only. > >> > >> If these methods in RelMetadataQuery are only for convenience, what is the > >> other way I can query a metadata ? The MedataFactory from the RelOptCluster > >> ? (Users would never expect to use this because it’s based on Java > >> Reflection and has performance problem) > >> > >> [1] > >> https://github.com/apache/calcite/blob/a3c56be7bccc58859524ba39e5b30b7078f97d00/core/src/main/java/org/apache/calcite/plan/RelOptRuleCall.java#L200 > >> > >> Best, > >> Danny Chan > >> 在 2019年6月12日 +0800 AM6:27,Julian Hyde <jh...@apache.org>,写道: > >>> The goal is that it should be possible to add a new kind of metadata. > >> The methods in RelMetadataQuery are for convenience only. So you should be > >> able to use your new kind of metadata, and existing ones, without modifying > >> RelMetadataQuery. > >>> > >>> If that is not possible, it’s a bug, and you should log a JIRA case. > >>> > >>> Given how the methods re-generate handlers (by catching exceptions and > >> modifying mutable private members) I can see that new kinds of metadata > >> might be difficult to add. > >>> > >>> Julian > >>> > >>> > >>>> On Jun 9, 2019, at 7:54 PM, Yuzhao Chen <yuzhao....@gmail.com> wrote: > >>>> > >>>> Thanks, Stamatis > >>>> > >>>> You are right, this discussion came up due to CALCITE-2885, it is not > >> about the performance problem, it is the extension of RelMetadataQuery, > >> because we add all kinds of top interfaces in RelMetadataQuery, e.g. the > >> row count query [1]. > >>>> > >>>> When we add a new RelMetadataProvider, a corresponding > >> interface/method should be added into RelMetadataQuery, but for current > >> RelOpeCluster impl, we could not do that(with a always default instance) > >>>> > >>>> As for the methods in RelMetadataProvider, i only saw the usage of > >> #handlers in RelMetadataQuery, and saw the reference you pase, it seems not > >> that relevant with this topic. What do you think ? > >>>> > >>>> [1] > >> https://github.com/apache/calcite/blob/941cd4e9540e3ef9b7c15daee42831a0c63da8b9/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L222 > >>>> > >>>> Best, > >>>> Danny Chan > >>>> 在 2019年6月5日 +0800 AM6:44,Stamatis Zampetakis <zabe...@gmail.com>,写道: > >>>>> Thanks for bringing this up Danny. > >>>>> > >>>>> I guess the discussion came up due to CALCITE-2885 [1]. Looking back > >> into > >>>>> it, I am not sure that the intention there is to make the > >> RelMetadataQuery > >>>>> pluggable. We could possibly solve the performance problem without > >>>>> extending the RelMetadataQuery. We have to look again into this case. > >>>>> > >>>>> For more details regarding the existence of the two methods in > >>>>> RelMetadataProvider have a look in CALCITE-604 [2]. More general for > >> the > >>>>> design of RelMetadataProvider you may find useful the description in > >> [3]. > >>>>> > >>>>> Best, > >>>>> Stamatis > >>>>> > >>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2885 > >>>>> [2] https://issues.apache.org/jira/browse/CALCITE-604 > >>>>> [3] > >>>>> > >> https://web.archive.org/web/20140624040836/www.hydromatic.net/wiki/RelationalExpressionMetadata/ > >>>>> > >>>>> On Tue, Jun 4, 2019 at 7:48 PM Julian Hyde <jh...@apache.org> wrote: > >>>>> > >>>>>>> 1. Why we have 2 methods in RelMetadataProvider? > >>>>>> > >>>>>> The metadata system is complicated. We need to allow multiple > >> handlers > >>>>>> for any given call. So, making a metadata call involves multiple > >>>>>> dispatch [1] based on the metadata method being called, the type of > >>>>>> RelNode, and the handlers that are registered. Also it needs to > >> cache > >>>>>> results, and detect cycles. And the dispatch needs to be > >> efficient, so > >>>>>> we generate janino code to do the dispatch, and re-generate when > >> new > >>>>>> handlers or sub-classes of RelNode are added. > >>>>>> > >>>>>> I forget details, the two methods are basically required to allow > >> us > >>>>>> to generate code to do the dispatch. > >>>>>> > >>>>>>> 2. We should make the RelMetadataQuery in RelOptCluster > >> pluggable. > >>>>>> > >>>>>> I disagree. RelMetadataQuery is not an extension point. Its sole > >>>>>> purpose is to to keep state (the cache and cycle-checking). > >>>>>> RelMetadataProvider is the extension point. (By analogy, if you are > >>>>>> un-parsing an AST, you let each AST sub-class handle unparsing > >> itself, > >>>>>> but the unparsed text goes to a simple StringBuilder. > >> RelMetadataQuery > >>>>>> is in the role of the StringBuilder. In a complex system, it is > >> nice > >>>>>> to keep some of the components simple, or at least keep them to > >>>>>> prescribed roles.) > >>>>>> > >>>>>> Julian > >>>>>> > >>>>>> [1] https://en.wikipedia.org/wiki/Multiple_dispatch > >>>>>> > >>>>>> On Sun, Jun 2, 2019 at 11:19 PM Yuzhao Chen <yuzhao....@gmail.com> > >> wrote: > >>>>>>> > >>>>>>> Currently we provide answer to metadata query through > >>>>>> RelMetadataProvider [1], there are some sub-classes of it: > >>>>>>> > >>>>>>> RelMetadataProvider > >>>>>>> | > >>>>>>> |- VolcanoRelMetadataProvider > >>>>>>> |- ChainedRelMetadataProvider/DefaultRelMetadataProvider > >>>>>>> |- HepRelMetadataProvider > >>>>>>> |- CachingRelMetadataProvider > >>>>>>> |- ReflectiveRelMetadataProvider > >>>>>>> |- JaninoRelMetadataProvider > >>>>>>> > >>>>>>> The RelMetadataProvider has two methods: #apply and #handlers, > >> the > >>>>>> #apply method seems a programming interface and there is a demo > >> code how we > >>>>>> can use it: > >>>>>>> > >>>>>>> RelMetadataProvider provider; > >>>>>>> LogicalFilter filter; > >>>>>>> RexNode predicate; > >>>>>>> Function<RelNode, Metadata> function = > >>>>>>> provider.apply(LogicalFilter.class, Selectivity.class}; > >>>>>>> Selectivity selectivity = function.apply(filter); > >>>>>>> Double d = selectivity.selectivity(predicate); > >>>>>>> > >>>>>>> But let's see our RelOptCluster's member variables[2], there are > >>>>>> MetadataFactory and RelMetadataQuery which all can be used to > >> query the > >>>>>> metadata, for MetadataFactory, there is a default impl named > >>>>>> MetadataFactoryImpl which will invoke RelMetadataProvider#apply > >> internally, > >>>>>> for RelMetadataQuery, it will invoke RelMetadataProvider#handlers > >> (finally > >>>>>> composed and codeden by JaninoRelMetadataProvider). > >>>>>>> > >>>>>>> In our planning phrase, we can invoke > >> RelOptRuleCall#getMetadataQuery to > >>>>>> get the MQ and query the metadata. > >>>>>>> > >>>>>>> For extension of metadata handlers, we can set our customized > >>>>>> RelMetadataProvider in RelOptCluster[3]. But for RelMetadataQuery, > >> we have > >>>>>> no way to extend it now, because the RelOptCluster always has a > >> singleton > >>>>>> instance [4] which is only the default implementation. > >>>>>>> > >>>>>>> > >>>>>>> My question is as follows: > >>>>>>> > >>>>>>> 1. Why we have 2 methods in RelMetadataProvider, and why we need > >> the > >>>>>> MetadataFactory and RelMetadataProvider#apply ? It seems that it's > >> function > >>>>>> is already been overriden by RelMetadataQuery(The difference is > >> that > >>>>>> MetadataFactory use Reflection and RelMetadataQuery use gened > >> bytes code). > >>>>>>> 2. We should make the RelMetadataQuery in RelOptCluster > >> pluggable. > >>>>>>> > >>>>>>> > >>>>>>> [1] > >>>>>> > >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataProvider.java#L38 > >>>>>>> [2] > >>>>>> > >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L49 > >>>>>>> [3] > >>>>>> > >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L135 > >>>>>>> [4] > >>>>>> > >> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L151 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Best, > >>>>>>> Danny Chan > >>>>>> > >>> > >> > > > >