Re: Re-working the metadata framework

Julian Hyde Mon, 18 Jan 2016 17:41:51 -0800

I describe in paragraph 3 why we CURRENTLY use reflection. In short: very 
complex dispatch requirements. The rest of the message is how I plan to phase 
it out. Plain java only has one dispatch mechanism (virtual methods) so isn’t 
going to cut it.


> On Jan 18, 2016, at 5:23 PM, Jacques Nadeau <[email protected]> wrote:
> 
> Can you go into more detail to why reflection is needed? It seems like we
> could get away from reflection by sharing interfaces, etc.
> 
> On Mon, Jan 18, 2016 at 5:12 PM, Julian Hyde <[email protected]> wrote:
> 
>> In https://issues.apache.org/jira/browse/CALCITE-794 <
>> https://issues.apache.org/jira/browse/CALCITE-794> we added an extra
>> parameter to each metadata call so that we could detect cyclic metadata
>> calls, and potentially to cache results so that a given statistic is never
>> computed more than once during a metadata call. But the overhead of making
>> calls into the metadata framework is still very high. It shows up as a big
>> fraction of the time spent optimizing complex queries. I am working on
>> https://issues.apache.org/jira/browse/CALCITE-604 <
>> https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix
>> that.
>> 
>> I am working on 604 while the release is closing, and I thought some of
>> you would be be interested to know where I am going.
>> 
>> We use reflection to make calls. This is necessary because the types of
>> metadata (e.g. selectivity, row count, unique keys, predicates) are
>> extensible, you can have multiple providers for each kind of metadata, each
>> provider has different methods for various RelNode sub-types, and we want
>> to be able to inherit handler methods (e.g. getUniqueKeys(Aggregate,
>> boolean) handles getUniqueKeys(LogicalAggregate, boolean) because there is
>> no handler method for LogicalAggregate.
>> 
>> Initially I thought we’d use MethodHandle, which is a lot faster than
>> method invocation by reflection. MethodHandle.invoke has some flexibility
>> based on the types of its arguments, but I realized we’d still have to
>> dispatch to multiple underlying providers (e.g. the built-in provider and
>> the Hive provider). And we have other inefficiencies such as calling
>> UnboundMetadata.bind(RelNode, RelMetadataQuery) to create a short-lived
>> object every single call.
>> 
>> So, now I am looking at using Janino to generate a dispatcher. Consider
>> just one kind of metadata, UniqueKeys. We already have a “signature”
>> interface:
>> 
>> public interface UniqueKeys extends Metadata {
>>  Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls);
>> }
>> 
>> I have added a handler interface:
>> 
>> interface UniqueKeysHandler {
>>  Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq,
>> boolean ignoreNulls);
>> }
>> 
>> Now, given a set of metadata providers and the set of all known RelNode
>> sub-type, I can use Janino to generate a handler at run time:
>> 
>> class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl {
>>  final RelMdUniqueKeys provider0;
>>  final HiveUniqueKeys provider1;
>> 
>>  UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys
>> provider1) {
>>    this.provider0 = provider0;
>>    this.provider1 = provider1;
>>  }
>> 
>>  public Set<ImmutableBitSet> getUniqueKeys(RelNode r,
>>      RelMetadataQuery mq, boolean ignoreNulls) {
>>    switch (r.getClass().getName()) {
>>    case "org.apache.calcite.rel.logical.LogicalAggregate":
>>      return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls);
>>    case "org.apache.calcite.rel.core.Aggregate":
>>      return provider0.getUniqueKeys(r, mq, ignoreNulls);
>>    case “
>> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate":
>>      return provider1.getUniqueKeys(r, mq, ignoreNulls);
>>    default:
>>      throw NoHandler.INSTANCE;
>>    }
>>  }
>> }
>> 
>> The entry point in RelMetadataQuery changes from
>> 
>> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
>>    boolean ignoreNulls) {
>>  final BuiltInMetadata.UniqueKeys metadata =
>>      rel.metadata(BuiltInMetadata.UniqueKeys.class, this);
>>  return metadata.getUniqueKeys(ignoreNulls);
>> }
>> 
>> to
>> 
>> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
>>    boolean ignoreNulls) {
>>  for (;;) {
>>    try {
>>      return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls);
>>    } catch (NoHandler e) {
>>      uniqueKeysHandler = metadataProvider.revise(rel.getClass(),
>>          BuiltInMetadata.UniqueKeys.Handler.class);
>>    }
>>  }
>> }
>> 
>> The “NoHandler” exception occurs very rarely — only when a kind of RelNode
>> is seen that hasn’t been seen before in this JVM instance — but gives the
>> handler chance to regenerate itself.
>> 
>> The result is a very direct path from the caller (generally a RelOptRule)
>> to the provider: two calls, and we don’t even need to box the arguments.
>> 
>> I don’t think there will be any API changes, but note that the metadata
>> interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass,
>> RelMetadataQuery mq) are not used anymore.
>> 
>> Julian
>> 
>>

Re: Re-working the metadata framework

Reply via email to