westonpace opened a new pull request #10795: URL: https://github.com/apache/arrow/pull/10795
Added a basic mapping generator that does not queue incoming jobs. This allows it to forward async-reentrant pressure to the source. Fixed some issues in the CSV reader that were preventing it from running truly parallel. Performance is now significantly better but still not quite the same as the threaded reader. For the NY taxi dataset the streaming read time went from ~7 seconds to ~1.6 seconds. However, the file reader is still at ~0.8 seconds. I'll do more investigation later. Leaving in draft as I want to extract a thread spawning generator I created into an independently tested thing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
