aprimadi commented on PR #6158: URL: https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1529252222
The external sort come to mind and also the k-way sort preserving merge. K-way sort preserving merge because there is a clever bit of the code that uses an array representation of a binary tree (to implement losers tree). Like for example this code: https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L252 https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L271 These really find the leaf node of the loser tree that this `cursor index` belongs to but it's not immediately apparent without realizing that the binary tree is represented this way: ``` 0 1 2 3 4 5 6 7 ``` i.e. the root of the tree is an element at index 1 in the vector and the first child of the root is an element at index 2 in the vector and so on... I guess I'm just hesitant to work on larger tasks due to: 1. Whether someone else is currently working on it 2. Whether it's high priority. Don't want to block the issue for a prolonged period. 3. If the issue touch areas that I'm currently not very familiar with. Again for the same reason as 2, don't want to block the issue for a prolonged period. Currently I prefer to work on issue that I can finish on one weekend and yeah currently #4495 is my default go to issue if I don't find issues that are interesting or that I'm sure I can finish on one weekend 😆. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
