GitHub user mustafasrepo closed a discussion: Does current SortExec consider input ordering.
Consider a use case where required ordering is `(a ASC,b ASC)`, and existing ordering is `(a ASC)`. As an example input is like following | a | b | | -------- | ------- | | 1 | 2 | | 1 | 3 | | 1 | 1 | | 2 | 2 | | 2 | 3 | | 2 | 1 | expected output is like following | a | b | | -------- | ------- | | 1 | 1 | | 1 | 2 | | 1 | 3 | | 2 | 1 | | 2 | 2 | | 2 | 3 | If we were to use information about existing ordering. We could buffer up a values until it changes like below | a | b | | -------- | ------- | | 1 | 2 | | 1 | 3 | | 1 | 1 | when 2 is received for the value of `a`. We could then sort subtable according to desired ordering (b ASC), then emit following result | a | b | | -------- | ------- | | 1 | 1 | | 1 | 2 | | 1 | 3 | I think this would enable us to use `SortExec` without breaking pipeline for some use cases (for this behaviour we can write a new operator also). Also some of the sort algorithms have friendlier paths, when their input is almost sorted. However, as far as I know current `SortExec` cannot produce results, without consuming all of its input. Is this the case, if so do you think this operator would be useful? GitHub link: https://github.com/apache/datafusion/discussions/7330 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
