I describe in paragraph 3 why we CURRENTLY use reflection. In short: very complex dispatch requirements. The rest of the message is how I plan to phase it out. Plain java only has one dispatch mechanism (virtual methods) so isn’t going to cut it.
> On Jan 18, 2016, at 5:23 PM, Jacques Nadeau <[email protected]> wrote: > > Can you go into more detail to why reflection is needed? It seems like we > could get away from reflection by sharing interfaces, etc. > > On Mon, Jan 18, 2016 at 5:12 PM, Julian Hyde <[email protected]> wrote: > >> In https://issues.apache.org/jira/browse/CALCITE-794 < >> https://issues.apache.org/jira/browse/CALCITE-794> we added an extra >> parameter to each metadata call so that we could detect cyclic metadata >> calls, and potentially to cache results so that a given statistic is never >> computed more than once during a metadata call. But the overhead of making >> calls into the metadata framework is still very high. It shows up as a big >> fraction of the time spent optimizing complex queries. I am working on >> https://issues.apache.org/jira/browse/CALCITE-604 < >> https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix >> that. >> >> I am working on 604 while the release is closing, and I thought some of >> you would be be interested to know where I am going. >> >> We use reflection to make calls. This is necessary because the types of >> metadata (e.g. selectivity, row count, unique keys, predicates) are >> extensible, you can have multiple providers for each kind of metadata, each >> provider has different methods for various RelNode sub-types, and we want >> to be able to inherit handler methods (e.g. getUniqueKeys(Aggregate, >> boolean) handles getUniqueKeys(LogicalAggregate, boolean) because there is >> no handler method for LogicalAggregate. >> >> Initially I thought we’d use MethodHandle, which is a lot faster than >> method invocation by reflection. MethodHandle.invoke has some flexibility >> based on the types of its arguments, but I realized we’d still have to >> dispatch to multiple underlying providers (e.g. the built-in provider and >> the Hive provider). And we have other inefficiencies such as calling >> UnboundMetadata.bind(RelNode, RelMetadataQuery) to create a short-lived >> object every single call. >> >> So, now I am looking at using Janino to generate a dispatcher. Consider >> just one kind of metadata, UniqueKeys. We already have a “signature” >> interface: >> >> public interface UniqueKeys extends Metadata { >> Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls); >> } >> >> I have added a handler interface: >> >> interface UniqueKeysHandler { >> Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq, >> boolean ignoreNulls); >> } >> >> Now, given a set of metadata providers and the set of all known RelNode >> sub-type, I can use Janino to generate a handler at run time: >> >> class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl { >> final RelMdUniqueKeys provider0; >> final HiveUniqueKeys provider1; >> >> UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys >> provider1) { >> this.provider0 = provider0; >> this.provider1 = provider1; >> } >> >> public Set<ImmutableBitSet> getUniqueKeys(RelNode r, >> RelMetadataQuery mq, boolean ignoreNulls) { >> switch (r.getClass().getName()) { >> case "org.apache.calcite.rel.logical.LogicalAggregate": >> return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls); >> case "org.apache.calcite.rel.core.Aggregate": >> return provider0.getUniqueKeys(r, mq, ignoreNulls); >> case “ >> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate": >> return provider1.getUniqueKeys(r, mq, ignoreNulls); >> default: >> throw NoHandler.INSTANCE; >> } >> } >> } >> >> The entry point in RelMetadataQuery changes from >> >> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel, >> boolean ignoreNulls) { >> final BuiltInMetadata.UniqueKeys metadata = >> rel.metadata(BuiltInMetadata.UniqueKeys.class, this); >> return metadata.getUniqueKeys(ignoreNulls); >> } >> >> to >> >> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel, >> boolean ignoreNulls) { >> for (;;) { >> try { >> return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls); >> } catch (NoHandler e) { >> uniqueKeysHandler = metadataProvider.revise(rel.getClass(), >> BuiltInMetadata.UniqueKeys.Handler.class); >> } >> } >> } >> >> The “NoHandler” exception occurs very rarely — only when a kind of RelNode >> is seen that hasn’t been seen before in this JVM instance — but gives the >> handler chance to regenerate itself. >> >> The result is a very direct path from the caller (generally a RelOptRule) >> to the provider: two calls, and we don’t even need to box the arguments. >> >> I don’t think there will be any API changes, but note that the metadata >> interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass, >> RelMetadataQuery mq) are not used anymore. >> >> Julian >> >>
