Hi,
Currently, the order of attributes is used to define operators equivalence.
This leads to several interesting problems, such as possible duplication of
expressions (e.g., "a,a,b") or additional work in rules to detect trivial
projects and/or input permutations (e.g. "a,b -> b,a").
But the biggest problem is the join order planning. In Calcite, AxB is not
equivalent to BxA:
1. It makes the ruleset [JoinCommuteRule, JoinAssociateRule]
insufficient to explore all bushy trees because the commute operation adds
a project on top of the new join (AxB -> project(BxA)), thus not
allowing for the associate rule to be executed on the upper join. The
solution is to add the ProjectJoinTransposeRule to the ruleset, but this
increases the planning time dramatically, making Apache Calcite unsuitable
for the cost-based join planning even for relatively simple join graphs.
2. It increases the number of join-related rule calls, which complicates
the implementation of the new join enumeration planning rules (e.g.
top-down enumerators) because duplicate derivations compromise performance.
My question is - has the community considered an idea to make the order of
columns a common property of all operators, somewhat similar to the trait,
but without an explicit enforcer?
For example, consider the following MEMO which the planner creates when
trying to transform the join AxB to BxA:
G1: { Scan1[table=t1, cols=a,b] }
G2: { Scan2[table=t2, cols=c,d] }
G3: { AxB[G1, G2], Project[G4, cols=$2,$3,$0,$1] }
G4: { BxA[G2, G1] }
However, if we make the column order part of the MEMO, we may potentially
have something like this:
G1: { Scan1[table=t1, cols=a,b] }
G2: { Scan2[table=t2, cols=c,d] }
G3 [cols=G1.$0, G1.$1, G2.$0, G2.$1]: { AxB[G1, G2], BxA[G2, G1] }
Notice, how we were able to put AxB and BxA to the same equivalence group.
To my knowledge, CockroachDB uses a somewhat similar design.
I realize that this is rather a radical idea, and more design work is
required to come with a proper proposal. At this point, I just would like
to kindly ask the community to share high-level feedback on that. Were
similar ideas proposed before?
Thank you,
Vladimir.