Can you go into more detail to why reflection is needed? It seems like we
could get away from reflection by sharing interfaces, etc.

On Mon, Jan 18, 2016 at 5:12 PM, Julian Hyde <[email protected]> wrote:

> In https://issues.apache.org/jira/browse/CALCITE-794 <
> https://issues.apache.org/jira/browse/CALCITE-794> we added an extra
> parameter to each metadata call so that we could detect cyclic metadata
> calls, and potentially to cache results so that a given statistic is never
> computed more than once during a metadata call. But the overhead of making
> calls into the metadata framework is still very high. It shows up as a big
> fraction of the time spent optimizing complex queries. I am working on
> https://issues.apache.org/jira/browse/CALCITE-604 <
> https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix
> that.
>
> I am working on 604 while the release is closing, and I thought some of
> you would be be interested to know where I am going.
>
> We use reflection to make calls. This is necessary because the types of
> metadata (e.g. selectivity, row count, unique keys, predicates) are
> extensible, you can have multiple providers for each kind of metadata, each
> provider has different methods for various RelNode sub-types, and we want
> to be able to inherit handler methods (e.g. getUniqueKeys(Aggregate,
> boolean) handles getUniqueKeys(LogicalAggregate, boolean) because there is
> no handler method for LogicalAggregate.
>
> Initially I thought we’d use MethodHandle, which is a lot faster than
> method invocation by reflection. MethodHandle.invoke has some flexibility
> based on the types of its arguments, but I realized we’d still have to
> dispatch to multiple underlying providers (e.g. the built-in provider and
> the Hive provider). And we have other inefficiencies such as calling
> UnboundMetadata.bind(RelNode, RelMetadataQuery) to create a short-lived
> object every single call.
>
> So, now I am looking at using Janino to generate a dispatcher. Consider
> just one kind of metadata, UniqueKeys. We already have a “signature”
> interface:
>
> public interface UniqueKeys extends Metadata {
>   Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls);
> }
>
> I have added a handler interface:
>
> interface UniqueKeysHandler {
>   Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq,
> boolean ignoreNulls);
> }
>
> Now, given a set of metadata providers and the set of all known RelNode
> sub-type, I can use Janino to generate a handler at run time:
>
> class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl {
>   final RelMdUniqueKeys provider0;
>   final HiveUniqueKeys provider1;
>
>   UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys
> provider1) {
>     this.provider0 = provider0;
>     this.provider1 = provider1;
>   }
>
>   public Set<ImmutableBitSet> getUniqueKeys(RelNode r,
>       RelMetadataQuery mq, boolean ignoreNulls) {
>     switch (r.getClass().getName()) {
>     case "org.apache.calcite.rel.logical.LogicalAggregate":
>       return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls);
>     case "org.apache.calcite.rel.core.Aggregate":
>       return provider0.getUniqueKeys(r, mq, ignoreNulls);
>     case “
> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate":
>       return provider1.getUniqueKeys(r, mq, ignoreNulls);
>     default:
>       throw NoHandler.INSTANCE;
>     }
>   }
> }
>
> The entry point in RelMetadataQuery changes from
>
> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
>     boolean ignoreNulls) {
>   final BuiltInMetadata.UniqueKeys metadata =
>       rel.metadata(BuiltInMetadata.UniqueKeys.class, this);
>   return metadata.getUniqueKeys(ignoreNulls);
> }
>
> to
>
> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
>     boolean ignoreNulls) {
>   for (;;) {
>     try {
>       return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls);
>     } catch (NoHandler e) {
>       uniqueKeysHandler = metadataProvider.revise(rel.getClass(),
>           BuiltInMetadata.UniqueKeys.Handler.class);
>     }
>   }
> }
>
> The “NoHandler” exception occurs very rarely — only when a kind of RelNode
> is seen that hasn’t been seen before in this JVM instance — but gives the
> handler chance to regenerate itself.
>
> The result is a very direct path from the caller (generally a RelOptRule)
> to the provider: two calls, and we don’t even need to box the arguments.
>
> I don’t think there will be any API changes, but note that the metadata
> interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass,
> RelMetadataQuery mq) are not used anymore.
>
> Julian
>
>

Reply via email to