alamb commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4070539161
Status update here is: 1. I have broken the parquet opener logic into an explicit state machine that separates IO and CPU work, which I think is a major improvement and sets us up for phase 2. I next plan to: 1. Update the opener to treat RowGroups as the initial morsel unit 2. Implement the work stealing (or global queue) Then verify that this approach actually gets the benefits seen by @Dandandan in his prototype If so, I'll then plan on starting to break the PR into smaller parts for review and adding better test coverage for the behaviors -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
