asolimando commented on PR #4585: URL: https://github.com/apache/calcite/pull/4585#issuecomment-3415735915
> > I haven't followed closely the PR which introduced collation optimization, I think it's a very important improvement, but my expectation was that we could forget about the original collation somehow. > > If that can't be achieved, I'd like to encapsulate both collations (original and optimized) in `RelCollation` and embed there all this logic. > > Making the consumers of this metadata aware of the optimization feels like a leak of abstraction > > @asolimando Thank you for your suggestion. I haven't yet determined the optimal implementation approach. The root cause is that both sort keys and Trait properties depend on the same underlying RelCollation. When sort column elimination is performed, the Trait gets modified. The Volcano planner maintains the initial Trait in the RelRoot, but mid-process Trait modifications cause the RelCollationproperty to no longer satisfy the root's requirements (for example, the reduced collation [0]cannot satisfy the original requirement [0,1]). As a result, the Volcano planner fails to select the more efficient `Sort` operator that has undergone sort key elimination. A potential solution I'm considering is to make the Volcano planner recognize that the reduced collation [0]actually satisfies the broader sorting property [0,1], enabling it to select the optimized Sortoperator with eliminated sort keys. I agree it's not a trivial problem, that's why I wanted to share an early feedback, even if I don't have a proper solution in mind either. Trying to make Volcano planner handle the complexity could be a reasonable solution, as the logic handling the collation "subsumption" would be centralized there, no more abstraction leak. If we can either fully delegate to the planner any such subsumption check it would be great, or at least expose convenient methods that other parts of code could call (making sure the planner is always accessible where this check is needed). If you feel that a broader discussion is needed, we can take it to jira and/or the mailing list, others might have useful suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
