Re: [I] Blog post for the DataFusion `51.0.0` release [datafusion]

via GitHub Sun, 09 Nov 2025 09:15:32 -0800


alamb commented on issue #18548:
URL: https://github.com/apache/datafusion/issues/18548#issuecomment-3508537754


   > Not a very exciting topic, but if you want some info for the blog post on 
the `case` work, here are some (unscientifically collected) numbers I got by 
running some queries on an SF10 lineitems parquet file.
   
   I am excited! 
   
   
   > I can make some proper versions of the diagrams from 
https://github.com/apache/datafusion/issues/18075#issuecomment-3422326710 if 
you want some more elaboration on what changed exactly in the implementation. 
In the end this is just memory shuffling reduction, so maybe "it's a bit faster 
now" is sufficient.
   
   Thank you for the offer -- unless you already have one or would like to 
spend time writing up a more full features blog about how to optimize CASE 
evaluation, I think some unscientific numbers are sufficient
   
   Speaking of a blog, I am not aware of any practical writeup of implementing 
optimized `CASE` (conditional) evaluation in a vectorized query engine (aka 
like DataFusion). I would be willing to help write such a post, but if so I 
would want to make it a high quality one:
   1. background of CASE / conditional
   2. Basic evaluation strategy
   3. Why it is hard to make CASE fast
   4. All the crazy stuff that we did to actually make it fast in DataFusion
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Blog post for the DataFusion `51.0.0` release [datafusion]

Reply via email to