Re: Allow Cascades driver invoking "derive" on the nodes produced by "passThrough"

Roman Kondakov Sun, 13 Feb 2022 01:40:23 -0800

Hi Alessandro,

this problem was already discussed on dev-list [1] and we have a ticketfor this [2].

My concern is that many projects use Calcite as a Lego kit: they tookinternal components of Calcite and combine them for building a customplanning and execution pipeline. And sometimes downstream projects needto change the default behavior of internal components to fit theirrequirements or overcome the bug. So the idea of keeping even internalcomponents of Calcite "more public" is rather a good thing than the badone from my point of view.


Thank you.

[1] https://lists.apache.org/thread/cykl74dcphgow4790fwoc8frsjglz7n1

[2] https://issues.apache.org/jira/browse/CALCITE-4542

--
Roman Kondakov


On 11.02.2022 19:15, Alessandro Solimando wrote:

Hello everyone,
@Vladimir, +1 on the change introducing "enforceDerive()".

@Roman, could you walk us through the limitations you found that forced you
to copy-paste the whole class?

Maybe there is some middle ground for your problem(s) too, similar in
spirit to what Vladimir proposed for the other limitation.

I am not against making the class more public if necessary, but it would be
nice to have a discussion here before going down that path.
If the discussion leads to a better design of the original class, all
projects would benefit from that.

Best regards,
Alessandro

On Fri, 11 Feb 2022 at 04:14, Roman Kondakov <[email protected]>
wrote:

Hi Vladimir,

+1 for making the rule driver more public. We've faced similar problems
in the downstream project. The solution was to copy and paste the
TopDownRuleDrive code with small fixes since it was not possible to
override the default behavior.

--
Roman Kondakov


On 11.02.2022 02:50, Vladimir Ozerov wrote:

Hi,

In the Cascades driver, it is possible to propagate the requests top-down
using the "passThrough", method and then notify parents bottom-up about

the

concrete physical implementations of inputs using the "derive" method.

In some optimizers, the valid parent node cannot be created before the
trait sets of inputs are known. An example is a custom distribution trait
that includes the number of shards in the system. The parent operator

alone

may guess the distribution keys, but cannot know the number of input
shards. To mitigate this, you may create a "template" node with an

infinite

cost from within the optimization rule that will propagate the
passThrough/drive calls but would never participate in the final plan.

Currency, the top-down driver designed in a way that the nodes created

from

the "passThrough" method are not notified on the "derive" stage. This

leads

to the incomplete exploration of the search space. For example, the rule
may produce the node "A1.template" that will be converted into a normal
"A1" node in the derive phase. However, if the parent operator produced
"A2.template" from "A1.template" using pass-through mechanics, the
"A2.template" will never be notified about the concrete input traits,
possibly losing the optimal plan. This is especially painful in

distributed

engines, where the number of shards is important for the placement of
Shuffle operators.

It seems that the problem could be solved with relatively low effort. The
"derive" is not invoked on the nodes created from the "passThrough"

method,

because such nodes are placed in the "passThroughCache" collection.

Instead

of doing this unconditionally, we may introduce an additional predicate
that would selectively enforce "derive" on such nodes. For example, this
could be a default method in the PhysicalNode interface, like:

interface PhysicalNode {
    default boolean enforceDerive() { return false; }
}

If there are no objections, I'll proceed with this change.

Alternatively, we may make the TopDownRuleDriver more "public", so that

the

user can extend it and decide within the driver whether to cache a
particular node or not.

I would appreciate your feedback on the matter.

Regards,
Vladimir.

Re: Allow Cascades driver invoking "derive" on the nodes produced by "passThrough"

Reply via email to