Hi,

I've tried to review CALCITE-2593, and it looks like the case not that
trivial.
The issue there is the following SQL fails with "Node ... could not be
implemented"

SQL:  select y from (values (1, 1), (2, 2)) as t(x, y) order by y

TL;DR: Julian seems to decline my fix (PR#989), and I fail to see technical
reasons there.
Unfortunately, the reasoning boils down to (see CALCITE-2764):
Julian>You're inventing requirements to match what you thought was the
design. That's wrong.

I'm suggesting to improve RelCompositeTrait#satisfy to handle
RelCompositeTraits arguments as well.
The only problem I see is that removal of RelTraitSet#simplify might
increase the number of ReSubsets a bit, however multi-collated relations
are rare in practice, so it should not impact planning much.


The interesting bit with SQL is that VALUES is sorted by both x and y
columns at once (it suits both "order by x" and "order by y").
Note: Calcite supports "composite collations" not for VALUES only, so other
relations might have those properties as well.

We have a on-going discussion with Julian, however I would like to hear
more opinions.
Especially I would like to hear something regarding CALCITE-2593.

My analysis is as follows:

Logical plan is fine:
LogicalSort(sort0=[$0], dir0=[ASC]) # collation=[0] since it is just a sort
  LogicalProject(Y=[$1]) # collation=[0]  (it projects Y, and there's a
collation on Y, so the result of Project is supposed to have collation=[0])
    LogicalValues(tuples=[[{ 1, 1 }, { 2, 2 }]])  # collation=[[0, 1], [1]]

However, when Volcano "registers" that logical plan, it creates subsets for
all the relations (RelSet / RelSubset entities).
So far so good, however the RelSubset for LogicalValues is awkward:
 rel#9:Subset#0.NONE.[]
[] means "no collation". In other words, it puts "multi-collated" values
into a subset that has "no" collation.

<side note>
Subset#0.NONE.[] is allowed to have relations with non-empty collations. It
is just fine.
Subset#0.NONE.[] is used by the planner like "give me a relation that has
Logical convention and WHATEVER collation you have there".
The subset then iterates over all the relations in the set and fetches the
matching ones. Of course the relation that is sorted suits the request of
"whatever",
so (Subset#0.NONE.[]).getRels() would include non-empty collated relations
as well.
</side note>

It results in the following "plan for optimization":
LogicalSort(subset=[rel#16:Subset#2.ENUMERABLE.[0]],
input=rel#11:Subset#1.NONE.[0])
  LogicalProject(subset=[rel#11:Subset#1.NONE.[0]],
input=rel#9:Subset#0.NONE.[], Y=[$1])
    LogicalValues(subset=[rel#9:Subset#0.NONE.[]], tuples=[[{ 1, 1 }, { 2,
2 }]])

The invalid state here is LogicalProject that has input with empty
collation and it somehow has [0] collation. In other words, technically
speaking, this LogicalProject acts as if it were a Sort node.
Apparently, Project does not sort rows. The collation of Project is just
inherited from the original plan, and rel#9:Subset#0.NONE.[] is created at
"register" time, while the collation of Subset is set to EMPTY by
RelTraitSet#simplify (I've no idea why planner discards composite
collations, but it is the implementation of #simplify).

Frankly speaking, I find LogicalProject(input=rel#9:Subset#0.NONE.[]) a
very very strange thing. I don't really think physical implementation of
Project are supposed to somehow magically sort rows.

Of course, when LogicalProject is converted to Enumerable convention,
EnumerableProject gets [] collation since the collation of input is [], and
EnumerableProject just keeps collation.
EnumerableProject ends up into ENUMERABLE.[] subset, while top-level
"LogicalSort" is trying to consume ENUMERABLE.[0] which can never happen,
thus planning fails.


Suggested fix:
1) Avoid use RelTraitSet#simplify for RelSubsets. In other words, register
all the collations as is, even the one with composite traits.
2) Enhance RelCompositeTrait#satisfies to "compare" properly
composite.satisfies(composite) case (CALCITE-2764)

Julian claims that composite.satisfies(composite) must never be allowed.
Julian did not comment CALCITE-2593 (values(...) order by x not planned)
issue though.

Just in case, here's the PR: https://github.com/apache/calcite/pull/989
The PR fixes original "unable to plan" issue, and it adds tests for
composite.satisfies(composite) case. It adds no new failures.

The interesting bit is Julian seems to decline the PR (at least
composite.satisfies(composite) bit of it), however I fail to see technical
reasoning there.
Especially, I've seen zero pointers to the necessity of
RelTraitSet#simplify (which just drops collations and converts to EMPTY for
unknown reason).

The other way to approach the issue is to completely disable use of
composite trait for collation, however it does not look very appealing. It
is more like setting a time bomb for the subsequent use of a composite
trait.

PS. The whole CALCITE-2593 / CALCITE-2764 story for me started from
reviewing PR#984, and I start losing all the interest in pushing that
forward. The sad thing is it looks like the discussion bothers Julian, and
we seem to make no progress.

Vladimir

Reply via email to