alamb commented on PR #20820: URL: https://github.com/apache/datafusion/pull/20820#issuecomment-4127301047
Brain dump of current status: 1. Clickbench seems to be going faster (great) 2. tpch is going slower (not great) I debugged the tpch slowdown a bit and it seems like one problem is that this branch makes the work uneven -- I will keep working on this At the core this PR has a few things (and maybe it is trying to do too much) 1. FileStream scheduler that is trying to cap outstanding IOs across all streams 2. FileStream scheduler that is stealing work from other sibling streams 3. Rewrite of the Parquet opener as an explicit state machine that transitions on an IO 4. a new morselizer API I am thinking I can potentially move a bunch of the Parquet opener code into its own PR to make sure it doesn't cause issues in isolation (and would get a big chunk of this PR out for review) Then in parallel I will keep debugging WFT is going on with tpch queries -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
