aprimadi commented on PR #6158:
URL: 
https://github.com/apache/arrow-datafusion/pull/6158#issuecomment-1529252222

   The external sort come to mind and also the k-way sort preserving merge.
   
   K-way sort preserving merge because there is a clever bit of the code that 
uses an array representation of a binary tree (to implement losers tree).
   
   Like for example this code:
   
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L252
   
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/sorts/merge.rs#L271
   These really find the leaf node of the loser tree that this `cursor index` 
belongs to but it's not immediately apparent without realizing that the binary 
tree is represented this way:
   
   ```
      0
      1 
    2   3
   4 5 6 7
   ```
   i.e. the root of the tree is an element at index 1 in the vector and the 
first child of the root is an element at index 2 in the vector and so on...
   
   I guess I'm just hesitant to work on larger tasks due to:
   1. Whether someone else is currently working on it
   2. Whether it's high priority. Don't want to block the issue for a prolonged 
period.
   3. If the issue touch areas that I'm currently not very familiar with. Again 
for the same reason as 2, don't want to block the issue for a prolonged period.
   
   Currently I prefer to work on issue that I can finish on one weekend and 
yeah currently #4495 is my default go to issue if I don't find issues that are 
interesting or that I'm sure I can finish on one weekend 😆.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to