ajegou commented on PR #22852: URL: https://github.com/apache/datafusion/pull/22852#issuecomment-4670382915
While preparing this PR we noticed a second, related failure mode that this change doesn't address: on multi-partition workloads, when a sibling partition has already tightened the shared dynamic filter but a given partition's local heap is still empty, attempt_early_completion can't conclude completion (it reads the local heap's max row, which is None). That cross-partition starvation case is the rest of the regression on multi-partition workloads. Addressing it cleanly requires lifting the threshold check from the local heap to the shared TopKDynamicFilters, which is a larger design change. @geoffreyclaude has a follow-up PR ready that does this and will be opened against apache shortly. We're keeping the two changes separate to keep each PR narrowly scoped — this one is correctness-safe on its own and the follow-up rests cleanly on top. Pinging @kosiew just in case this has an impact on your review / approval -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
