Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

via GitHub Thu, 05 Jun 2025 04:10:05 -0700


ozankabak commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2943759107


   > @ozankabak I still don't think you need all this API work since there's a 
zero API change way to deal with cancellation already. Tests all pass with no 
API changes in the 'all Stream implementations must be well behaved Tokio 
citizens' approach. I understand the performance concern, but maybe it's a bit 
premature to design APIs before knowing what the actual performance impact is? 
In terms of code changes I don't think the complexity argument holds since the 
required code changes were fairly trivial.
   
   > I've found a dedicated machine to run the benchmarks on in the meantime. 
It's 10 year old hardware (Xeon E5620) so the compile times take forever, but 
should be good enough for relative comparisons. Will post results when I get 
them.
   
   @pepijnve, I respect your opinion but we will need to agree to disagree. 
After spending a lot of time (and writing a lot of upstream *and* downstream DF 
code) over the last few years for leveraging the async runtime and its 
challenges/advantages, issues related to pipeline-breaking, performance 
implications of these things, the APIs `ExecutionPlan` objects should provide, 
responsibilities of the planner vs. operators and others, my intuition tells me 
that:
   1. There is some information the `ExecutionPlan` API doesn't expose yet 
about input pipelining behavior and propagation of pendings, and it should -- 
not just for this use case, but others too.
   2. There is a way to solve this problem universally with optimally minimal 
overhead and it is not that hard to figure out.
   3. This way will also help us reduce the responsibility of user-defined 
operators, and solve cancellability even without their strict cooperation.
   4. There is a lot of downstream users who define user-defined operators, and 
any "win" (in the above sense) for such use cases is important for our project 
goals.
   5. I suspect (and my confidence is somewhat lower on this point relative to 
others) we will always be able to construct some cases, however contrived, 
where an "everyone always yields" solution will suffer from performance 
problems.
   6. My intuition could be wrong, and if this thesis (1-2-3) indeed turns out 
to be wrong, we can take the learnings and fall back to another solution, where 
your proposal would be a good candidate.
   
   I feel like we are getting close to a point where we start having 
not-so-fruitful discussions. I think I have made a good effort to make my 
arguments and reasoning clear. We will see where this effort goes (and I'm very 
hopeful that we will succeed), and if you want to help, we will always take it 
with appreciation. I think you can probably relate to the position that I would 
like to focus my thinking on getting 1-2-3 done (if possible) in the 
short-term, instead of repeatedly spending time on justifying what we are 
doing. If this route doesn't work, I will gladly help with finding another 
solution (and maybe that will your approach).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

Reply via email to