mattcuento commented on issue #1359: URL: https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-3729122521
Hey @milenkovic, also happy to help here! A lot of the proposal makes sense to me, though I should say my experience here is limited to Flink's adaptive scheduling which is more strictly focused on parallelism. +1 for the second approach, testability, rollout, and falling back to a static planner sound more simple in this way. Agreed that physical optimizer passes between stages with stage-based execution. Two questions: - Im curious about optimizations like two-phased aggregations, which would span stages (local agg in one, global in the next). I’ve seen limited examples of adaptive two-phase aggregations where the local agg can be disabled in the presence of a large keyspace. Have you given any thought to how the design might allow/disallow future extensions for rules that span > 1 stage and aim to adapt a running stage? Sounds like such rules wouldn’t be possible with a pass after each stage, which is fine, but wanted to make that thought explicit. - For observability sake, do we plan to log out or keep in memory each round of optimizations and the original execution graph? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
