Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

via GitHub Thu, 05 Jun 2025 06:55:01 -0700


pepijnve commented on PR #16196:
URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2944505070


   > I feel like we are getting close to a point where we start having 
not-so-fruitful discussions. I think I have made a good effort to make my 
arguments and reasoning clear.
   
   @ozankabak My apologies. I didn't mean to derail your efforts here and I'll 
refrain from adding any more noise to the thread (beyond this, sorry). I 
appreciate the fact that you guys have much much more experience working in 
this codebase. I'm really trying to make a good faith contribution here where 
we compare the pros/cons of both approaches via measurements (API impact, 
performance impact, etc.), but I'll back off.
   
   FWIW, I've added some more tests cases in the meantime that you guys can use 
or ignore however you see fit. I also have some benchmark results from a first 
run at https://gist.github.com/pepijnve/21fbd480ae3e60f780446ace974d3ef5. It's 
a very mixed bag. I'm going to run the suite again a couple of times to see if 
this is consistent or not before I dig deeper.
   
   During the runs I'm seeing so much variability in runtime on both branches 
that I have my doubts how meaningful these results are. Would it be useful to 
let the benchmark perform more runs and adapt the tool a bit to report on mean 
and standard deviation rather than just average?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

Reply via email to