I don’t think the [] means “no collation”. It means “order by zero columns”. 
The list of columns is present, but it is empty. 

Of course, this “empty collation” provides no guarantees as to the ordering of 
rows. It is a collation; it’s just the weakest possible collation.

> On Jan 7, 2019, at 1:23 AM, Stamatis Zampetakis <[email protected]> wrote:
> 
> Hi Vladimir,
> 
> From the analysis so far, the following part caught my attention:
> 
> Vladimir>So far so good, however the RelSubset for LogicalValues is awkward:
> rel#9:Subset#0.NONE.[]
> [] means "no collation". In other words, it puts "multi-collated" values
> into a subset that has "no" collation.
> ...
> The collation of Project is just
> inherited from the original plan, and rel#9:Subset#0.NONE.[] is created at
> "register" time, while the collation of Subset is set to EMPTY by
> RelTraitSet#simplify (I've no idea why planner discards composite
> collations, but it is the implementation of #simplify).
> 
> Indeed the behavior seems a bit weird and it makes me think that maybe
> there is a bug in #simplify method it self.
> 
> According to commit XXX
> 
> Julian> When creating a RelSubset make sure that its traits are simple.
> (Occurs when creating a subset to convert an existing rel that has multiple
> collations to a different calling convention. The subset can only have one
> collation.)
> 
> the subset must have only one collation.
> 
> Looking into the implementation of #simplify it seems that when the
> composite collation trait is of size one then the simplification simply
> returns that trait.
> When the composite collation trait is of size different than one we are
> obtaining the default collation trait which in this case is the EMPTY
> collation.
> This behavior is a bit suspect.
> 
> Moreover, the javadoc of #simplify method indicates the following:
> /** Returns a trait set similar to this one but with all composite traits
> flattened. */
> 
> I was thinking that flattening refers to unnesting some information but the
> code does more than that by pruning also some traits. This is another
> reason that makes me think that maybe there is a bug in #simplify.
> 
> 
> Στις Κυρ, 6 Ιαν 2019 στις 9:24 π.μ., ο/η Vladimir Sitnikov <
> [email protected]> έγραψε:
> 
>> Hi,
>> 
>> I've tried to review CALCITE-2593, and it looks like the case not that
>> trivial.
>> The issue there is the following SQL fails with "Node ... could not be
>> implemented"
>> 
>> SQL:  select y from (values (1, 1), (2, 2)) as t(x, y) order by y
>> 
>> TL;DR: Julian seems to decline my fix (PR#989), and I fail to see technical
>> reasons there.
>> Unfortunately, the reasoning boils down to (see CALCITE-2764):
>> Julian>You're inventing requirements to match what you thought was the
>> design. That's wrong.
>> 
>> I'm suggesting to improve RelCompositeTrait#satisfy to handle
>> RelCompositeTraits arguments as well.
>> The only problem I see is that removal of RelTraitSet#simplify might
>> increase the number of ReSubsets a bit, however multi-collated relations
>> are rare in practice, so it should not impact planning much.
>> 
>> 
>> The interesting bit with SQL is that VALUES is sorted by both x and y
>> columns at once (it suits both "order by x" and "order by y").
>> Note: Calcite supports "composite collations" not for VALUES only, so other
>> relations might have those properties as well.
>> 
>> We have a on-going discussion with Julian, however I would like to hear
>> more opinions.
>> Especially I would like to hear something regarding CALCITE-2593.
>> 
>> My analysis is as follows:
>> 
>> Logical plan is fine:
>> LogicalSort(sort0=[$0], dir0=[ASC]) # collation=[0] since it is just a sort
>>  LogicalProject(Y=[$1]) # collation=[0]  (it projects Y, and there's a
>> collation on Y, so the result of Project is supposed to have collation=[0])
>>    LogicalValues(tuples=[[{ 1, 1 }, { 2, 2 }]])  # collation=[[0, 1], [1]]
>> 
>> However, when Volcano "registers" that logical plan, it creates subsets for
>> all the relations (RelSet / RelSubset entities).
>> So far so good, however the RelSubset for LogicalValues is awkward:
>> rel#9:Subset#0.NONE.[]
>> [] means "no collation". In other words, it puts "multi-collated" values
>> into a subset that has "no" collation.
>> 
>> <side note>
>> Subset#0.NONE.[] is allowed to have relations with non-empty collations. It
>> is just fine.
>> Subset#0.NONE.[] is used by the planner like "give me a relation that has
>> Logical convention and WHATEVER collation you have there".
>> The subset then iterates over all the relations in the set and fetches the
>> matching ones. Of course the relation that is sorted suits the request of
>> "whatever",
>> so (Subset#0.NONE.[]).getRels() would include non-empty collated relations
>> as well.
>> </side note>
>> 
>> It results in the following "plan for optimization":
>> LogicalSort(subset=[rel#16:Subset#2.ENUMERABLE.[0]],
>> input=rel#11:Subset#1.NONE.[0])
>>  LogicalProject(subset=[rel#11:Subset#1.NONE.[0]],
>> input=rel#9:Subset#0.NONE.[], Y=[$1])
>>    LogicalValues(subset=[rel#9:Subset#0.NONE.[]], tuples=[[{ 1, 1 }, { 2,
>> 2 }]])
>> 
>> The invalid state here is LogicalProject that has input with empty
>> collation and it somehow has [0] collation. In other words, technically
>> speaking, this LogicalProject acts as if it were a Sort node.
>> Apparently, Project does not sort rows. The collation of Project is just
>> inherited from the original plan, and rel#9:Subset#0.NONE.[] is created at
>> "register" time, while the collation of Subset is set to EMPTY by
>> RelTraitSet#simplify (I've no idea why planner discards composite
>> collations, but it is the implementation of #simplify).
>> 
>> Frankly speaking, I find LogicalProject(input=rel#9:Subset#0.NONE.[]) a
>> very very strange thing. I don't really think physical implementation of
>> Project are supposed to somehow magically sort rows.
>> 
>> Of course, when LogicalProject is converted to Enumerable convention,
>> EnumerableProject gets [] collation since the collation of input is [], and
>> EnumerableProject just keeps collation.
>> EnumerableProject ends up into ENUMERABLE.[] subset, while top-level
>> "LogicalSort" is trying to consume ENUMERABLE.[0] which can never happen,
>> thus planning fails.
>> 
>> 
>> Suggested fix:
>> 1) Avoid use RelTraitSet#simplify for RelSubsets. In other words, register
>> all the collations as is, even the one with composite traits.
>> 2) Enhance RelCompositeTrait#satisfies to "compare" properly
>> composite.satisfies(composite) case (CALCITE-2764)
>> 
>> Julian claims that composite.satisfies(composite) must never be allowed.
>> Julian did not comment CALCITE-2593 (values(...) order by x not planned)
>> issue though.
>> 
>> Just in case, here's the PR: https://github.com/apache/calcite/pull/989
>> The PR fixes original "unable to plan" issue, and it adds tests for
>> composite.satisfies(composite) case. It adds no new failures.
>> 
>> The interesting bit is Julian seems to decline the PR (at least
>> composite.satisfies(composite) bit of it), however I fail to see technical
>> reasoning there.
>> Especially, I've seen zero pointers to the necessity of
>> RelTraitSet#simplify (which just drops collations and converts to EMPTY for
>> unknown reason).
>> 
>> The other way to approach the issue is to completely disable use of
>> composite trait for collation, however it does not look very appealing. It
>> is more like setting a time bomb for the subsequent use of a composite
>> trait.
>> 
>> PS. The whole CALCITE-2593 / CALCITE-2764 story for me started from
>> reviewing PR#984, and I start losing all the interest in pushing that
>> forward. The sad thing is it looks like the discussion bothers Julian, and
>> we seem to make no progress.
>> 
>> Vladimir
>> 

Reply via email to