Re: [I] [Python] Only convert in parallel for the ConsolidatedBlockCreator class for large data [arrow]

via GitHub Mon, 08 Apr 2024 14:01:21 -0700


felipecrv commented on issue #40301:
URL: https://github.com/apache/arrow/issues/40301#issuecomment-2043635169


   Sharing some progress here in the open:
   
   > the elegant solution is work-stealing. Now you know why work-stealing was 
invented.
   
   @anjakefala will test a version of this that uses a `ParallelFor` [1] that 
doesn't create one `Future` per task/column — the function keeps going by 
popping the next task (column index) from a shared `std::atomic<int>` called 
`next_task`. This atomic integer plays the same role as the multiple task 
queues in a generic work-stealing algorithm implementation.
   
   BTW, this is pseudo-code for generic Work Stealing:
   
   ```python
   function WorkerThread:
       while true:
           if own_task_queue is not empty:
               task = own_task_queue.pop()
               execute(task)
           else:
               // Try to steal work from another thread
               victim = randomly_select_thread()
               stolen_task = victim.steal_task()
               if stolen_task is not None:
                   execute(stolen_task)
               else:
                   // No work available, terminate
                   return
   
   function steal_task:
       if victim_task_queue is not empty:
           return victim_task_queue.pop()
       else:
           return None
   ```
   
   In `ParallelForWithBackoff` [1], each worker starts from a task/column they 
should complete (consider that the `own_task_queue`) and when done it can keep 
going while `next_task.fetch_add(1, std::memory_order_acq_rel)` returns a task 
smaller than `num_tasks`. Getting that atomic and incrementing it is how it 
"steals" the task from all the other workers. This maximizes throughput and 
reduces thread-local allocations because the same thread can very quickly go 
through many columns.
   
   [1] 
https://github.com/felipecrv/arrow/blob/b54cb408a8c73af0c4fc76cdceadca0c0b240cb0/cpp/src/arrow/util/parallel.h#L104-L229
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Only convert in parallel for the ConsolidatedBlockCreator class for large data [arrow]

Reply via email to