In https://issues.apache.org/jira/browse/CALCITE-794
<https://issues.apache.org/jira/browse/CALCITE-794> we added an extra parameter
to each metadata call so that we could detect cyclic metadata calls, and
potentially to cache results so that a given statistic is never computed more
than once during a metadata call. But the overhead of making calls into the
metadata framework is still very high. It shows up as a big fraction of the
time spent optimizing complex queries. I am working on
https://issues.apache.org/jira/browse/CALCITE-604
<https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix that.
I am working on 604 while the release is closing, and I thought some of you
would be be interested to know where I am going.
We use reflection to make calls. This is necessary because the types of
metadata (e.g. selectivity, row count, unique keys, predicates) are extensible,
you can have multiple providers for each kind of metadata, each provider has
different methods for various RelNode sub-types, and we want to be able to
inherit handler methods (e.g. getUniqueKeys(Aggregate, boolean) handles
getUniqueKeys(LogicalAggregate, boolean) because there is no handler method for
LogicalAggregate.
Initially I thought we’d use MethodHandle, which is a lot faster than method
invocation by reflection. MethodHandle.invoke has some flexibility based on the
types of its arguments, but I realized we’d still have to dispatch to multiple
underlying providers (e.g. the built-in provider and the Hive provider). And we
have other inefficiencies such as calling UnboundMetadata.bind(RelNode,
RelMetadataQuery) to create a short-lived object every single call.
So, now I am looking at using Janino to generate a dispatcher. Consider just
one kind of metadata, UniqueKeys. We already have a “signature” interface:
public interface UniqueKeys extends Metadata {
Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls);
}
I have added a handler interface:
interface UniqueKeysHandler {
Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq, boolean
ignoreNulls);
}
Now, given a set of metadata providers and the set of all known RelNode
sub-type, I can use Janino to generate a handler at run time:
class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl {
final RelMdUniqueKeys provider0;
final HiveUniqueKeys provider1;
UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys provider1) {
this.provider0 = provider0;
this.provider1 = provider1;
}
public Set<ImmutableBitSet> getUniqueKeys(RelNode r,
RelMetadataQuery mq, boolean ignoreNulls) {
switch (r.getClass().getName()) {
case "org.apache.calcite.rel.logical.LogicalAggregate":
return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls);
case "org.apache.calcite.rel.core.Aggregate":
return provider0.getUniqueKeys(r, mq, ignoreNulls);
case “
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate":
return provider1.getUniqueKeys(r, mq, ignoreNulls);
default:
throw NoHandler.INSTANCE;
}
}
}
The entry point in RelMetadataQuery changes from
public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
boolean ignoreNulls) {
final BuiltInMetadata.UniqueKeys metadata =
rel.metadata(BuiltInMetadata.UniqueKeys.class, this);
return metadata.getUniqueKeys(ignoreNulls);
}
to
public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
boolean ignoreNulls) {
for (;;) {
try {
return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls);
} catch (NoHandler e) {
uniqueKeysHandler = metadataProvider.revise(rel.getClass(),
BuiltInMetadata.UniqueKeys.Handler.class);
}
}
}
The “NoHandler” exception occurs very rarely — only when a kind of RelNode is
seen that hasn’t been seen before in this JVM instance — but gives the handler
chance to regenerate itself.
The result is a very direct path from the caller (generally a RelOptRule) to
the provider: two calls, and we don’t even need to box the arguments.
I don’t think there will be any API changes, but note that the metadata
interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass,
RelMetadataQuery mq) are not used anymore.
Julian