Hi Vladimir,

> There are two problems here. First, the project operator potentially
> destroys any trait which depends on column order, such as distribution or
> collation. Therefore, EnumerableProject has an incorrect value of the
> distribution trait.

The enumerable convention is intended for in-memory, non-distributed 
environment.
Therefore, we only consider 2 traits: collation and convention. Other traits 
are not
guaranteed to work correctly. If you want it work with distribution, you have 
to create
your own operators, rules, either by extending or overriding, in which case, 
you will need
to remap distribution columns to get the correct distribution trait, just like 
how collation does.

> Second, which distribution should I assign to the CustomToEnumerable node?
> As I know that parent convention cannot handle the distribution properly,
> my natural thought is to set it to ANY.

You can assume CustomToEnumerable to be an Enforcer operator, like Sort, 
Exchange.
Sort only changes data collation, Exchange changes data distribution and 
collation, similarly 
CustomToEnumerable only change convention, but retains collation and 
distribution, I assume.
But in practice, it should be decided by the operator inventor and the 
underlying physical
implementation.

Hope that answers your question. Feel free to ask if you have more questions.

Thanks,
Haisheng Yuan

On 2021/03/27 08:43:15, Vladimir Ozerov <ppoze...@gmail.com> wrote: 
> Hi,
> 
> Apache Calcite supports heterogeneous optimization when nodes may have
> different conventions. The Enumerable rules propagate all traits from
> inputs. We have doubts whether this is correct or not.
> 
> Consider the following initial plan, which was created by Apache Calcite
> after sql-to-rel conversion and invocation of TranslatableTable.toRel. The
> table is in the CUSTOM convention. In this convention, there is an
> additional Distribution trait that tracks which attribute is used for
> sharding. It could be either SHARDED or ANY. The latter is the default
> distribution value which is used when the distribution is unknown. Suppose
> that the table is distributed by the attribute $0.
> LogicalProject [convention=NONE,   distribution=ANY]
>   CustomTable  [convention=CUSTOM, distribution=SHARDED($0)]
> 
> Now suppose that we run VolcanoPlanner with two rules: EnumerableProjectRule
> and converter rules that translate the CUSTOM node to ENUMERABLE node.
> First, the EnumerableProjectRule is executed. This rule propagates traits
> from the input, replacing only convention. Notice, how it propagated the
> distribution trait.
> EnumerableProject [convention=ENUMERABLE, distribution=SHARDED($0)]
>   CustomTable     [convention=CUSTOM,     distribution=SHARDED($0)]
> 
> Next, the converter will be invoked, yielding the following final plan:
> EnumerableProject    [convention=ENUMERABLE, distribution=SHARDED($0)]
>   CustomToEnumerable [convention=ENUMERABLE, distribution=???]
>     CustomTable      [convention=CUSTOM,     distribution=SHARDED($0)]
> 
> There are two problems here. First, the project operator potentially
> destroys any trait which depends on column order, such as distribution or
> collation. Therefore, EnumerableProject has an incorrect value of the
> distribution trait.
> Second, which distribution should I assign to the CustomToEnumerable node?
> As I know that parent convention cannot handle the distribution properly,
> my natural thought is to set it to ANY. However, at least in the top-down
> optimizer, this will lead to CannotPlanException, unless I declare that [ANY
> satisfies SHARDED($0)], which is not the case: ANY is unknown distribution,
> so all distribution satisfies ANY, but not vice versa.
> 
> My question is - shouldn't we ensure that only the collation trait is
> propagated from child nodes in Enumerable rules? For example, in the
> EnumerableProjectRule instead of doing:
> input.getTraitSet()
>   .replace(EnumerableConvention.INSTANCE)
>   .replace(<newCollation>)
> 
> we may do:
> RelOptCluster.traitSet().
>   .replace(EnumerableConvention.INSTANCE)
>   .replace(<newCollation>)
> 
> This would ensure that all other traits are set to the default value. The
> generalization of this idea is that every convention has a set of supported
> traits. Every unsupported trait should be set to the default value.
> 
> I would appreciate your feedback on the matter.
> 
> Regards,
> Vladimir.
> 

Reply via email to