alamb commented on PR #20820:
URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4070539161

   Status update here is:
   1. I have broken the parquet opener logic into an explicit state machine 
that separates IO and CPU work, which I think is a major improvement and sets 
us up for phase 2.
   
   I next plan to:
   1. Update the opener to treat RowGroups as the initial morsel unit
   2. Implement the work stealing (or global queue)
   
   Then verify that this approach actually gets the benefits seen by @Dandandan 
 in his prototype
   
   If so, I'll then plan on starting to break the PR into smaller parts for 
review and adding better test coverage for the behaviors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to