alamb commented on issue #18548: URL: https://github.com/apache/datafusion/issues/18548#issuecomment-3508537754
> Not a very exciting topic, but if you want some info for the blog post on the `case` work, here are some (unscientifically collected) numbers I got by running some queries on an SF10 lineitems parquet file. I am excited! > I can make some proper versions of the diagrams from https://github.com/apache/datafusion/issues/18075#issuecomment-3422326710 if you want some more elaboration on what changed exactly in the implementation. In the end this is just memory shuffling reduction, so maybe "it's a bit faster now" is sufficient. Thank you for the offer -- unless you already have one or would like to spend time writing up a more full features blog about how to optimize CASE evaluation, I think some unscientific numbers are sufficient Speaking of a blog, I am not aware of any practical writeup of implementing optimized `CASE` (conditional) evaluation in a vectorized query engine (aka like DataFusion). I would be willing to help write such a post, but if so I would want to make it a high quality one: 1. background of CASE / conditional 2. Basic evaluation strategy 3. Why it is hard to make CASE fast 4. All the crazy stuff that we did to actually make it fast in DataFusion -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
