Re: [DISCUSS] RelCompositeTrait

Julian Hyde Mon, 08 Apr 2019 16:39:05 -0700

It seemed reasonable when I introduced it, and seems very reasonable, that a 
relational expression (even in the relational model) can have multiple physical 
properties. Consider these questions that the planner might ask:


Example 1:

“Are you sorted on hiredate?”
“Yes”
“Are you sorted on empno?”
“Yes”
“Are you sorted on deptno?”
“No”

Example 2:

“Can you fit into less than 100MB of memory?”
“Yes”
“Can you fit into less than 10MB of memory?”
“Yes”
“Can you fit into less than 1MB of memory?”
“No”

We manage traits like those in example 1 using RelCompositeTrait. We can’t 
handle traits like this in example 2, and so we have trained ourselves to not 
think of “can fit into memory X” as a trait at all.

Perhaps our mistake is to have an API “tell me all of your traits” rather than 
an API “do you have trait X?”. Asking a RelNode to enumerate its traits can be 
painful: the extreme case is an empty Values with 100 columns; it satisfies any 
sort order, and there are 100! of these.

Julian



> On Apr 8, 2019, at 3:51 PM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> Hi Haisheng,
> 
> Thanks for raising awareness around this topic. I also think we should try
> to find a solution.
> 
> Initially, the Volcano planner was designed to be able to cover multiple
> models (and not only the relational). For non-relational models composite
> traits may be indispensable. I don't know if there are people in this list
> that are using the planner for other models but if there are it would be
> nice to hear from them.
> 
> Focusing exclusively on the relational model, I think composite traits are
> useful. One use-case that comes to my mind is data replication. It
> perfectly makes sense to partition (distribute) your table on two (or more)
> columns to be able execute efficiently queries using special partition
> joins. A concrete use-case is RDF data where many distributed systems store
> the triples table partitioned by subject and object. I guess such use-cases
> could possibly be modelled in other ways but composite traits is what comes
> naturally to my mind.
> 
> Regarding multi-sorted tables it is not that rare if you import sorted data
> into a table with an auto-increment primary key for example.
> 
> I think all the trait-related issues can be solved if we prioritize them
> correctly. Apart from Vladimir and Hongze, who already spend quite some
> time on these, the rest of us should also jump in and try to help.
> 
> Best,
> Stamatis
> 
> 
> 
> 
> On Sun, Apr 7, 2019 at 9:48 AM Haisheng Yuan <h.y...@alibaba-inc.com> wrote:
> 
>> Hi,
>> 
>> I found there are some RelCompositeTrait related issues:
>> https://issues.apache.org/jira/browse/CALCITE-2010
>> https://issues.apache.org/jira/browse/CALCITE-2593
>> https://issues.apache.org/jira/browse/CALCITE-2764
>> 
>> Multi-sorted table are rare in pratice, mutil-distributed table doesn't
>> exist either. Values node with several tuples is not worth optimization,
>> with many tuples is not worth optimization either, because the time it
>> takes optimizer to figure out the ordering may be longer than just sort it
>> in runtime.
>> 
>> In issue https://issues.apache.org/jira/browse/CALCITE-1990,
>> Leo extended RelDistribution to inherit RelMultipleTrait, just like
>> RelCollation does, to solve his problem in the example. But I don't think
>> this is an appropriate way to represent the equivalence classes (in
>> PostgreSQL's term).
>> 
>> So why did we introduce RelCompisteTrait and RelMultipleTrait in the
>> beginning? Seems like it gives us more pain than gain.
>> 
>> Thanks ~
>> Haisheng Yuan
>>

Re: [DISCUSS] RelCompositeTrait

Reply via email to