asolimando commented on PR #4585:
URL: https://github.com/apache/calcite/pull/4585#issuecomment-3415735915

   > > I haven't followed closely the PR which introduced collation 
optimization, I think it's a very important improvement, but my expectation was 
that we could forget about the original collation somehow.
   > > If that can't be achieved, I'd like to encapsulate both collations 
(original and optimized) in `RelCollation` and embed there all this logic.
   > > Making the consumers of this metadata aware of the optimization feels 
like a leak of abstraction
   > 
   > @asolimando Thank you for your suggestion. I haven't yet determined the 
optimal implementation approach. The root cause is that both sort keys and 
Trait properties depend on the same underlying RelCollation. When sort column 
elimination is performed, the Trait gets modified. The Volcano planner 
maintains the initial Trait in the RelRoot, but mid-process Trait modifications 
cause the RelCollationproperty to no longer satisfy the root's requirements 
(for example, the reduced collation [0]cannot satisfy the original requirement 
[0,1]). As a result, the Volcano planner fails to select the more efficient 
`Sort` operator that has undergone sort key elimination. A potential solution 
I'm considering is to make the Volcano planner recognize that the reduced 
collation [0]actually satisfies the broader sorting property [0,1], enabling it 
to select the optimized Sortoperator with eliminated sort keys.
   
   I agree it's not a trivial problem, that's why I wanted to share an early 
feedback, even if I don't have a proper solution in mind either.
   
   Trying to make Volcano planner handle the complexity could be a reasonable 
solution, as the logic handling the collation "subsumption" would be 
centralized there, no more abstraction leak.
   
   If we can either fully delegate to the planner any such subsumption check it 
would be great, or at least expose convenient methods that other parts of code 
could call (making sure the planner is always accessible where this check is 
needed).
   
   If you feel that a broader discussion is needed, we can take it to jira 
and/or the mailing list, others might have useful suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to