[GitHub] [arrow-rs] devinjdangelo commented on issue #1718: Support encoding a single parquet file using multiple threads

via GitHub Mon, 25 Sep 2023 17:28:50 -0700


devinjdangelo commented on issue #1718:
URL: https://github.com/apache/arrow-rs/issues/1718#issuecomment-1734643791


   > A naive solution might be to just spawn tokio tasks for each column of 
each batch, but this will have very poor thread locality, high per-batch 
overheads, and in general feels a little off.
   
   Regarding this point, https://github.com/apache/arrow-datafusion/pull/7655 
only spawns 1 task for each column for each row group (not each record batch). 
Each record batch is sent via a channel to the parallel tasks. Once 
max_row_group_size is reached, the parallel tasks are joined and new ones 
spawned in their place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] devinjdangelo commented on issue #1718: Support encoding a single parquet file using multiple threads

Reply via email to