In https://issues.apache.org/jira/browse/CALCITE-794 
<https://issues.apache.org/jira/browse/CALCITE-794> we added an extra parameter 
to each metadata call so that we could detect cyclic metadata calls, and 
potentially to cache results so that a given statistic is never computed more 
than once during a metadata call. But the overhead of making calls into the 
metadata framework is still very high. It shows up as a big fraction of the 
time spent optimizing complex queries. I am working on 
https://issues.apache.org/jira/browse/CALCITE-604 
<https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix that.

I am working on 604 while the release is closing, and I thought some of you 
would be be interested to know where I am going.

We use reflection to make calls. This is necessary because the types of 
metadata (e.g. selectivity, row count, unique keys, predicates) are extensible, 
you can have multiple providers for each kind of metadata, each provider has 
different methods for various RelNode sub-types, and we want to be able to 
inherit handler methods (e.g. getUniqueKeys(Aggregate, boolean) handles 
getUniqueKeys(LogicalAggregate, boolean) because there is no handler method for 
LogicalAggregate.

Initially I thought we’d use MethodHandle, which is a lot faster than method 
invocation by reflection. MethodHandle.invoke has some flexibility based on the 
types of its arguments, but I realized we’d still have to dispatch to multiple 
underlying providers (e.g. the built-in provider and the Hive provider). And we 
have other inefficiencies such as calling UnboundMetadata.bind(RelNode, 
RelMetadataQuery) to create a short-lived object every single call.

So, now I am looking at using Janino to generate a dispatcher. Consider just 
one kind of metadata, UniqueKeys. We already have a “signature” interface:

public interface UniqueKeys extends Metadata {
  Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls);
}

I have added a handler interface:

interface UniqueKeysHandler {
  Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq, boolean 
ignoreNulls);
}

Now, given a set of metadata providers and the set of all known RelNode 
sub-type, I can use Janino to generate a handler at run time:

class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl {
  final RelMdUniqueKeys provider0;
  final HiveUniqueKeys provider1;

  UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys provider1) {
    this.provider0 = provider0;
    this.provider1 = provider1;
  }

  public Set<ImmutableBitSet> getUniqueKeys(RelNode r,
      RelMetadataQuery mq, boolean ignoreNulls) {
    switch (r.getClass().getName()) {
    case "org.apache.calcite.rel.logical.LogicalAggregate":
      return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls);
    case "org.apache.calcite.rel.core.Aggregate":
      return provider0.getUniqueKeys(r, mq, ignoreNulls);
    case “ 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate":
      return provider1.getUniqueKeys(r, mq, ignoreNulls);
    default:
      throw NoHandler.INSTANCE;
    }
  }
}

The entry point in RelMetadataQuery changes from

public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
    boolean ignoreNulls) {
  final BuiltInMetadata.UniqueKeys metadata =
      rel.metadata(BuiltInMetadata.UniqueKeys.class, this);
  return metadata.getUniqueKeys(ignoreNulls);
}

to

public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
    boolean ignoreNulls) {
  for (;;) {
    try {
      return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls);
    } catch (NoHandler e) {
      uniqueKeysHandler = metadataProvider.revise(rel.getClass(),
          BuiltInMetadata.UniqueKeys.Handler.class);
    }
  }
}

The “NoHandler” exception occurs very rarely — only when a kind of RelNode is 
seen that hasn’t been seen before in this JVM instance — but gives the handler 
chance to regenerate itself.

The result is a very direct path from the caller (generally a RelOptRule) to 
the provider: two calls, and we don’t even need to box the arguments.

I don’t think there will be any API changes, but note that the metadata 
interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass, 
RelMetadataQuery mq) are not used anymore.

Julian

Reply via email to