Thanks for sharing your thoughts and goals. I'm very engaged in this but
would like some time to review the code and really think through the
problem. Can we keep this open for a while? I also think that as Vladimir
asked, we should come up with some concrete ways to evaluate the impact of
various techniques.

On Thu, Jan 21, 2016 at 4:56 PM, Julian Hyde <[email protected]> wrote:

> I can’t see any requirements that we can drop. Here are the requirements:
>
> * We need multiple providers (e.g. built-in and Drill),
> * It’s not essential but it’s very convenient to find providers by looking
> for methods that are like the implemented signature except that the first
> argument can be a sub-class
> * We need to detect cycles and cache previously computed results
> * We need to be able to flush the cache
> * We need to allow kinds of metadata to be defined at run time (although
> it’s certainly more convenient if we can add an accessor method to
> RelMetadataQuery)
> * We need to be able to create composite providers.
> * We need to be able to add providers at run time (and indeed use
> different combinations of providers during the planning of a single query).
> * We can’t assume that we know ahead of time the full set of RelNode
> sub-classes for which we will ask for metadata.
> * We need to be able to unwind or see through HepRelVertex or RelSubset
> nodes that are introduced by the Hep and Volcano planning engines.
> * For RelSubset we need to combine the constituent rels’ metadata (e.g.
> take the min of the MaxRowCount, or the union of the inferred predicates).
>
> I’ve dropped the requirement that you create an instance of the metadata
> class by “binding” a RelNode and a RelMetadataQuery. Now for each metadata
> interface you define a handler interface that has an extra 2 parameters. So
> we no longer need to create temporary “bound” objects.
>
> Given all those requirements we have a complex dispatch problem.
> Reflection solved it but it was difficult to understand AND inefficient.
> Code generation is difficult to understand but yields code that is easy to
> understand (if you turn on -Dcalcite.debug) and is efficient.
>
> My dev branch is complete and ready for review. Take a look at
> https://issues.apache.org/jira/browse/CALCITE-604 for an example of the
> generated code; I think it clarifies things a lot.
>
> Julian
>
>
>
> > On Jan 18, 2016, at 9:31 PM, Jacques Nadeau <[email protected]> wrote:
> >
> > I read your email.  My thought was that we should evaluate the real
> > requirements. Maybe we need to shed some capabilities to achieve better
> > performance. Can you break down the issue into a set of example patterns?
> > In general, it seems like we may be trying to be too flexible. I
> understand
> > the desire to maintain as much backwards compatibility as possible but
> > think we may want to come up with a simpler metaphor. My goal is to come
> up
> > with the right design rather than trying to optimize what might be the
> > wrong design. Maybe you've done all the thinking and believe the existing
> > metaphors are exactly right but I think we should get consensus around
> this
> > the requirements before jumping to implementation approaches such as code
> > generation.
> > On Jan 18, 2016 5:40 PM, "Julian Hyde" <[email protected]> wrote:
> >
> >> I describe in paragraph 3 why we CURRENTLY use reflection. In short:
> very
> >> complex dispatch requirements. The rest of the message is how I plan to
> >> phase it out. Plain java only has one dispatch mechanism (virtual
> methods)
> >> so isn’t going to cut it.
> >>
> >>> On Jan 18, 2016, at 5:23 PM, Jacques Nadeau <[email protected]>
> wrote:
> >>>
> >>> Can you go into more detail to why reflection is needed? It seems like
> we
> >>> could get away from reflection by sharing interfaces, etc.
> >>>
> >>> On Mon, Jan 18, 2016 at 5:12 PM, Julian Hyde <[email protected]> wrote:
> >>>
> >>>> In https://issues.apache.org/jira/browse/CALCITE-794 <
> >>>> https://issues.apache.org/jira/browse/CALCITE-794> we added an extra
> >>>> parameter to each metadata call so that we could detect cyclic
> metadata
> >>>> calls, and potentially to cache results so that a given statistic is
> >> never
> >>>> computed more than once during a metadata call. But the overhead of
> >> making
> >>>> calls into the metadata framework is still very high. It shows up as a
> >> big
> >>>> fraction of the time spent optimizing complex queries. I am working on
> >>>> https://issues.apache.org/jira/browse/CALCITE-604 <
> >>>> https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix
> >>>> that.
> >>>>
> >>>> I am working on 604 while the release is closing, and I thought some
> of
> >>>> you would be be interested to know where I am going.
> >>>>
> >>>> We use reflection to make calls. This is necessary because the types
> of
> >>>> metadata (e.g. selectivity, row count, unique keys, predicates) are
> >>>> extensible, you can have multiple providers for each kind of metadata,
> >> each
> >>>> provider has different methods for various RelNode sub-types, and we
> >> want
> >>>> to be able to inherit handler methods (e.g. getUniqueKeys(Aggregate,
> >>>> boolean) handles getUniqueKeys(LogicalAggregate, boolean) because
> there
> >> is
> >>>> no handler method for LogicalAggregate.
> >>>>
> >>>> Initially I thought we’d use MethodHandle, which is a lot faster than
> >>>> method invocation by reflection. MethodHandle.invoke has some
> >> flexibility
> >>>> based on the types of its arguments, but I realized we’d still have to
> >>>> dispatch to multiple underlying providers (e.g. the built-in provider
> >> and
> >>>> the Hive provider). And we have other inefficiencies such as calling
> >>>> UnboundMetadata.bind(RelNode, RelMetadataQuery) to create a
> short-lived
> >>>> object every single call.
> >>>>
> >>>> So, now I am looking at using Janino to generate a dispatcher.
> Consider
> >>>> just one kind of metadata, UniqueKeys. We already have a “signature”
> >>>> interface:
> >>>>
> >>>> public interface UniqueKeys extends Metadata {
> >>>> Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls);
> >>>> }
> >>>>
> >>>> I have added a handler interface:
> >>>>
> >>>> interface UniqueKeysHandler {
> >>>> Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq,
> >>>> boolean ignoreNulls);
> >>>> }
> >>>>
> >>>> Now, given a set of metadata providers and the set of all known
> RelNode
> >>>> sub-type, I can use Janino to generate a handler at run time:
> >>>>
> >>>> class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl {
> >>>> final RelMdUniqueKeys provider0;
> >>>> final HiveUniqueKeys provider1;
> >>>>
> >>>> UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys
> >>>> provider1) {
> >>>>   this.provider0 = provider0;
> >>>>   this.provider1 = provider1;
> >>>> }
> >>>>
> >>>> public Set<ImmutableBitSet> getUniqueKeys(RelNode r,
> >>>>     RelMetadataQuery mq, boolean ignoreNulls) {
> >>>>   switch (r.getClass().getName()) {
> >>>>   case "org.apache.calcite.rel.logical.LogicalAggregate":
> >>>>     return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls);
> >>>>   case "org.apache.calcite.rel.core.Aggregate":
> >>>>     return provider0.getUniqueKeys(r, mq, ignoreNulls);
> >>>>   case “
> >>>>
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate":
> >>>>     return provider1.getUniqueKeys(r, mq, ignoreNulls);
> >>>>   default:
> >>>>     throw NoHandler.INSTANCE;
> >>>>   }
> >>>> }
> >>>> }
> >>>>
> >>>> The entry point in RelMetadataQuery changes from
> >>>>
> >>>> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
> >>>>   boolean ignoreNulls) {
> >>>> final BuiltInMetadata.UniqueKeys metadata =
> >>>>     rel.metadata(BuiltInMetadata.UniqueKeys.class, this);
> >>>> return metadata.getUniqueKeys(ignoreNulls);
> >>>> }
> >>>>
> >>>> to
> >>>>
> >>>> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
> >>>>   boolean ignoreNulls) {
> >>>> for (;;) {
> >>>>   try {
> >>>>     return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls);
> >>>>   } catch (NoHandler e) {
> >>>>     uniqueKeysHandler = metadataProvider.revise(rel.getClass(),
> >>>>         BuiltInMetadata.UniqueKeys.Handler.class);
> >>>>   }
> >>>> }
> >>>> }
> >>>>
> >>>> The “NoHandler” exception occurs very rarely — only when a kind of
> >> RelNode
> >>>> is seen that hasn’t been seen before in this JVM instance — but gives
> >> the
> >>>> handler chance to regenerate itself.
> >>>>
> >>>> The result is a very direct path from the caller (generally a
> >> RelOptRule)
> >>>> to the provider: two calls, and we don’t even need to box the
> arguments.
> >>>>
> >>>> I don’t think there will be any API changes, but note that the
> metadata
> >>>> interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M>
> metadataClass,
> >>>> RelMetadataQuery mq) are not used anymore.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to