[GitHub] [arrow] westonpace opened a new pull request #9607: ARROW-7001: [C++] Develop threading APIs to accommodate nested parallelism

GitBox Mon, 01 Mar 2021 08:30:31 -0800


westonpace opened a new pull request #9607:
URL: https://github.com/apache/arrow/pull/9607



   Still very much a WIP, expect more details to come.  This PR ports the 
dataset/scanner logic to async.  It does not actually make any readers (e.g. 
parquet reader, ipc reader) async.  This will technically solve ARROW-7001 but 
is also a step towards making the readers async.
   
   There should be a performance gain from this with datasets with fewer files 
than # of cores (since we can make the inner reads nested)
   There should be a performance gain from this above and beyond that for CSV 
datasets (since we have a proper async CSV).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace opened a new pull request #9607: ARROW-7001: [C++] Develop threading APIs to accommodate nested parallelism

Reply via email to