GitHub user mustafasrepo closed a discussion: Does current SortExec consider 
input ordering.

Consider a use case where required ordering is `(a ASC,b ASC)`, and existing 
ordering is `(a ASC)`. 
As an example input is like following

| a  | b |
| -------- | ------- |
| 1  | 2    |
| 1 | 3     |
| 1    | 1    |
| 2  | 2    |
| 2 | 3     |
| 2    | 1    |

expected output is like following

| a   | b |
| -------- | ------- |
| 1  | 1    |
| 1 | 2     |
| 1    | 3    |
| 2  | 1   |
| 2 | 2     |
| 2    | 3    |

If we were to use information about existing ordering. We could buffer up a 
values until it changes like below
| a  | b |
| -------- | ------- |
| 1  | 2    |
| 1 | 3     |
| 1    | 1    |

when 2 is received for the value of `a`. We could then sort subtable according 
to desired ordering (b ASC), then emit following result
| a  | b |
| -------- | ------- |
| 1  | 1   |
| 1 | 2     |
| 1    | 3   |

I think this would enable us to use `SortExec` without breaking pipeline for 
some use cases (for this behaviour we can write a new operator also). Also some 
of the sort algorithms have friendlier paths, when their input is almost 
sorted. However, as far as I know current `SortExec` cannot produce results, 
without consuming all of its input. Is this the case, if so do you think this 
operator would be useful?




GitHub link: https://github.com/apache/datafusion/discussions/7330

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to