Can you go into more detail to why reflection is needed? It seems like we could get away from reflection by sharing interfaces, etc.
On Mon, Jan 18, 2016 at 5:12 PM, Julian Hyde <[email protected]> wrote: > In https://issues.apache.org/jira/browse/CALCITE-794 < > https://issues.apache.org/jira/browse/CALCITE-794> we added an extra > parameter to each metadata call so that we could detect cyclic metadata > calls, and potentially to cache results so that a given statistic is never > computed more than once during a metadata call. But the overhead of making > calls into the metadata framework is still very high. It shows up as a big > fraction of the time spent optimizing complex queries. I am working on > https://issues.apache.org/jira/browse/CALCITE-604 < > https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix > that. > > I am working on 604 while the release is closing, and I thought some of > you would be be interested to know where I am going. > > We use reflection to make calls. This is necessary because the types of > metadata (e.g. selectivity, row count, unique keys, predicates) are > extensible, you can have multiple providers for each kind of metadata, each > provider has different methods for various RelNode sub-types, and we want > to be able to inherit handler methods (e.g. getUniqueKeys(Aggregate, > boolean) handles getUniqueKeys(LogicalAggregate, boolean) because there is > no handler method for LogicalAggregate. > > Initially I thought we’d use MethodHandle, which is a lot faster than > method invocation by reflection. MethodHandle.invoke has some flexibility > based on the types of its arguments, but I realized we’d still have to > dispatch to multiple underlying providers (e.g. the built-in provider and > the Hive provider). And we have other inefficiencies such as calling > UnboundMetadata.bind(RelNode, RelMetadataQuery) to create a short-lived > object every single call. > > So, now I am looking at using Janino to generate a dispatcher. Consider > just one kind of metadata, UniqueKeys. We already have a “signature” > interface: > > public interface UniqueKeys extends Metadata { > Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls); > } > > I have added a handler interface: > > interface UniqueKeysHandler { > Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq, > boolean ignoreNulls); > } > > Now, given a set of metadata providers and the set of all known RelNode > sub-type, I can use Janino to generate a handler at run time: > > class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl { > final RelMdUniqueKeys provider0; > final HiveUniqueKeys provider1; > > UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys > provider1) { > this.provider0 = provider0; > this.provider1 = provider1; > } > > public Set<ImmutableBitSet> getUniqueKeys(RelNode r, > RelMetadataQuery mq, boolean ignoreNulls) { > switch (r.getClass().getName()) { > case "org.apache.calcite.rel.logical.LogicalAggregate": > return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls); > case "org.apache.calcite.rel.core.Aggregate": > return provider0.getUniqueKeys(r, mq, ignoreNulls); > case “ > org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate": > return provider1.getUniqueKeys(r, mq, ignoreNulls); > default: > throw NoHandler.INSTANCE; > } > } > } > > The entry point in RelMetadataQuery changes from > > public Set<ImmutableBitSet> getUniqueKeys(RelNode rel, > boolean ignoreNulls) { > final BuiltInMetadata.UniqueKeys metadata = > rel.metadata(BuiltInMetadata.UniqueKeys.class, this); > return metadata.getUniqueKeys(ignoreNulls); > } > > to > > public Set<ImmutableBitSet> getUniqueKeys(RelNode rel, > boolean ignoreNulls) { > for (;;) { > try { > return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls); > } catch (NoHandler e) { > uniqueKeysHandler = metadataProvider.revise(rel.getClass(), > BuiltInMetadata.UniqueKeys.Handler.class); > } > } > } > > The “NoHandler” exception occurs very rarely — only when a kind of RelNode > is seen that hasn’t been seen before in this JVM instance — but gives the > handler chance to regenerate itself. > > The result is a very direct path from the caller (generally a RelOptRule) > to the provider: two calls, and we don’t even need to box the arguments. > > I don’t think there will be any API changes, but note that the metadata > interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass, > RelMetadataQuery mq) are not used anymore. > > Julian > >
