devinjdangelo commented on issue #1718: URL: https://github.com/apache/arrow-rs/issues/1718#issuecomment-1734643791
> A naive solution might be to just spawn tokio tasks for each column of each batch, but this will have very poor thread locality, high per-batch overheads, and in general feels a little off. Regarding this point, https://github.com/apache/arrow-datafusion/pull/7655 only spawns 1 task for each column for each row group (not each record batch). Each record batch is sent via a channel to the parallel tasks. Once max_row_group_size is reached, the parallel tasks are joined and new ones spawned in their place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
