R-JunmingChen commented on issue #35268:
URL: https://github.com/apache/arrow/issues/35268#issuecomment-1546690050

   > Great! You might take a look at #35320 as you're getting started. It helps 
give some overview of Acero in general.
   
   I am stuck with how to create a appropriate design for external sort in 
acero. 
   With your doc, I think it's good if I first create a external sorting kernel 
and then use it in acero. 
   But acero needs seperate parts of external sorting.  To be acurate:
   - In the `InputReceived()` of execnode `order_by_sink` with spillover, we 
should read data to buffer with size of `buffer_size` . When the buffer is 
fully filled, we should sort the buffer and write the sorted data to disk. 
   - In the `DoFinish()` of execnode `order_by_sink` with spillover, we should 
perform a n-way external merge sorting.
   
   So I have two simple plan,
   1. Implement an externalOrderByImpl and implement above methods on its own.  
In this way, we do not implement anything in compute kernel. May be we still 
need to call the sorting method in compute in `InputReceived()`.
   2. Implement two kernels, one does the spillover things, one dose the n-way 
external merge sorting. And  externalOrderByImpl  just call the two kernel in 
its `InputReceived()`  and `DoFinish()`. It will be a little weird, cause I 
think the two kernels don't conform arrow's design principle for kernel.
   
   I need some suggestions before I implement codes for external sorting.
    
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to