Hi colleagues,

We are building a Calcite-based optimizer for Hazelcast, and I have some
problems understanding Calcite's logic with respect to converters. Let me
briefly explain the problem.

We have an execution backend, so we do not need Bindable or Enumerable.
Instead, we would like to use Calcite to convert original SQL to a tree
with our own convention, then convert it to our internal representation,
and finally, execute.

We started with looking at other Calcite integrations and eventually came
to a classical two-phase optimization approach. We have two internal
conventions - LOGICAL and PHYSICAL. The goal is to optimize the tree as
follows:
1) NONE -> LOGICAL - heuristical optimizations
2) LOGICAL -> PHYSICAL - cost-based planning

Suppose that after the first phase I have the following tree of our own
operators:
HZLogicalRoot
-> HZLogicalProject
  -> HZLogicalScan

For this specific case, there is not much to optimize, so we only need to
transition to physical nodes and do some boilerplate with traits
propagation:
HZPhysicalRoot
-> HZPhysicalProject
  -> HZPhysicalScan

In order to achieve this, I define three rules, which just do a conversion
of relevant nodes. Volcano optimizer is used.

Now, the problem - somehow it works only when I override
Convention.Impl.canConvertConvention to true for our PHYSICAL convention,
but that blows the search space and the same rules are called many times. A
lot of time is spent on endless PHYSICAL -> LOGICAL conversions, which are
of no use.

If I change canConvertConvention to false, then rules are called a sensible
number of times, but cannot produce a complete PHYSICAL tree. Here is how
it works:
1) "Root" rule is invoked, which converts "HZLogicalRoot" to
"HZPhysicalRoot"
2) "Project" rule is invoked, but do not produce any transformations, since
it needs Scan distribution, which is not known yet. This desired behavior
at this point.
3) "Scan" rule is invoked, "HZLogicalScan" is converted to
"HZPhysicalScan". Distribution is resolved
4) At this point, we have [LogicalRoot, PhysicalRoot] -> [LogicalProject]
-> [LogicalScan, PhysicalScan] sets . I expect that since new scan was
installed, the "Project" rule will be fired again. This time we know the
distribution, so the transformation is possible. But the rule is not called
and we fail with an error.

So my questions are:
1) What is the real role of converters in this process? For some reason,
when unnecessary (from a logical standpoint) PHYSICAL -> LOGICAL conversion
is allowed, even complex plans could be built. And Drill does it for some
reason. But it costs multiple additional invocations of the same rules. Are
there any docs or presentations explaining the mechanics behind?
2) What are the minimum requirements, that will allow a rule on the parent
to be fired again after it's child node has changed?

I can provide any additional information, source code or even working
example of this problem if needed. I don't want to bother you with it at
the moment, because it feels like I miss something very simple.

Would appreciate your help.

Regards,
Vladimir.

Reply via email to