Re: [PR] Add blog post regarding CASE work [datafusion-site]

via GitHub Mon, 22 Dec 2025 04:37:13 -0800


alamb commented on code in PR #122:
URL: https://github.com/apache/datafusion-site/pull/122#discussion_r2639606416



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.

Review Comment:
   I would personally phrase this like "tests that value against expressions 
(typically constants)" as the SQL doesn't say anything specific about constants



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion

Review Comment:
   I think it would be good to start this post off with some sort of 
quantification / visual of "how much faster is CASE after these optimizations"
   
   Maybe either a chart like on 
https://datafusion.apache.org/blog/output/2025/09/29/datafusion-50.0.0/
   
   <img width="858" height="506" alt="Image" 
src="https://github.com/user-attachments/assets/26968a55-32d3-4926-a4cc-6956d24d9f03";
 />
   
   Or a table
   
   It would also be cool to have an "ablation" version (aka measure the 
performance after each additional optimization was added -- like a chart that 
shows progressive improvement).
   
   Maybe we could use the "average runtime of the case benchmark"? I can look 
into generating this if you like



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.

Review Comment:
   I think it might also make sense to mention that DataFusion (now) also 
rewrites all other conditional expressions to CASE (like `COALESCE`, `IF`, etc) 
so that these optimizations are now widely used



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.

Review Comment:
   FWIW this is a really nice clear and succinct CASE introduction ❤️ 



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.

Review Comment:
   ```suggestion
   While correct, this implementation has significant room for optimization, 
mostly related to the usage of `evaluate_selection`.
   ```



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.
+- **Scattering**: `PhysicalExpr::evaluate_selection` scatters the filtered 
result back to the original `RecordBatch` length.
+- **Merging**: The `zip` kernel is called once per branch to merge partial 
results into the output array
+
+Each of these steps allocates new arrays and shuffles a lot of data around. 
+
+### Problem 3: Filtering Unused Columns
+
+The `PhysicalExpr::evaluate_selection` method filters the entire record batch, 
including columns that the current branch's `WHEN` and `THEN` expressions don't 
reference.
+For wide tables (many columns) with narrow expressions (few column 
references), this is wasteful.
+
+Suppose we have a table with 26 columns named `a` through `z`.
+For a simple CASE expression like:
+
+```sql
+CASE
+  WHEN a > 1000 THEN 'large'
+  WHEN a >= 0 THEN 'positive'
+  ELSE 'negative'
+END
+```
+
+the implementation would filter all columns 26 columns even though only a 
single column is needed for the entire `CASE` expression evaluation.
+Again this involves a non-negligible amount of allocation and data copying.
+
+## Performance Improvements
+
+### Optimization 1: Short-Circuit Early Exit (commit 7c215ed)
+
+The first optimization added early exit logic to the evaluation loop:
+
+```rust
+let mut remainder_count = batch.num_rows();
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    if remainder_count == 0 {
+        break;  // All rows matched, exit early
+    }
+
+    // ... evaluate branch ...
+
+    let when_match_count = when_value.true_count();
+    remainder_count -= when_match_count;
+}
+```
+
+Additionally, we avoid evaluating the `ELSE` clause when no rows remain:
+
+```rust
+if let Some(else_expr) = &self.else_expr {
+    remainder = or(&base_nulls, &remainder)?;
+    if remainder.true_count() > 0 {
+        // ... evaluate else ...
+    }
+}
+```
+
+**Impact**: For queries where early branches match all rows, this eliminates 
unnecessary branch evaluations and `ELSE` clause processing.
+
+### Optimization 2: Optimized Result Merging (commit e9431fc)
+
+The second optimization fundamentally restructured how partial results are 
merged.
+Instead of using `zip()` after each branch to merge results into an output 
array, we now:
+
+1. Maintain the subset of rows still needing evaluation across loop iterations
+2. Filter the batch progressively as rows are matched
+3. Build an index structure that tracks which branch produced each row's result
+4. Perform a single merge operation at the end
+
+The key insight is that we can defer all merging until the end of the 
evaluation loop by tracking result provenance.
+When a branch matches a number of rows, instead of immediately merging with 
`zip()`, we:
+1. Store the partial result array
+2. Mark the cells corresponding to each row in an indices array as needing to 
take one value from the partial result array
+
+In the example below, three `WHEN/THEN` branches produced results.
+The first branch produced the result `A` for 2, the second produced `B` for 
row 1, and the third produced `C` and `D` for rows 4 and 5.
+The final result array is obtained by running the arrays through the merge 
operation.
+
+```aiignore
+┌───────────┐  ┌─────────┐                             ┌─────────┐
+│┌─────────┐│  │   None  │                             │   NULL  │
+││    A    ││  ├─────────┤                             ├─────────┤
+│└─────────┘│  │    1    │                             │    B    │
+│┌─────────┐│  ├─────────┤                             ├─────────┤
+││    B    ││  │    0    │   merge_n(values, indices)  │    A    │
+│└─────────┘│  ├─────────┤  ─────────────────────────▶ ├─────────┤
+│┌─────────┐│  │   None  │                             │   NULL  │
+││    C    ││  ├─────────┤                             ├─────────┤
+│├─────────┤│  │    2    │                             │    C    │
+││    D    ││  ├─────────┤                             ├─────────┤
+│└─────────┘│  │    2    │                             │    D    │
+└───────────┘  └─────────┘                             └─────────┘
+   arrays        indices                                  result
+```
+
+The main benefits of this merge operation are that the `scatter` step is 
eleminated enitrely, and instead of requiring
+a `zip` per branch only a single `merge_n` is done.
+
+Besides more efficient merging, we also maintain a progressively filtered 
`remainder_batch`.
+
+```rust
+let mut remainder_batch = Cow::Borrowed(batch);
+let mut remainder_rows = UInt32Array::from_iter_values(0..batch.num_rows());
+
+for branch in branches {
+    // Evaluate on progressively smaller batch
+    let when_value = evaluate_on(&remainder_batch)?;
+
+    // Filter remainder_batch for next iteration
+    let next_filter = create_filter(&not(&when_value)?);
+    remainder_batch = Cow::Owned(filter_record_batch(&remainder_batch, 
&next_filter)?);
+    remainder_rows = filter_array(&remainder_rows, &next_filter)?;
+}
+```
+
+As rows are matched, the `remainder_batch` shrinks, making subsequent filter 
operations a bit faster.
+
+**Impact**: This eliminates N-1 merge operations and makes filtering 
progressively cheaper as branches match rows.
+
+This new operation was initially developed specifically for DataFusion's 
`CASE` evaluation, but in the meantime has been generalized and moved into the 
`arrow-rs` crate as 
[`arrow_select::merge::merge_n`](https://docs.rs/arrow-select/57.1.0/arrow_select/merge/fn.merge_n.html).
+
+### Optimization 3: Column Projection (commit c3e49fb)
+
+The third optimization addresses the "filtering unused columns" problem 
through projection.
+Before evaluating a CASE expression, we:
+
+1. Analyze all WHEN/THEN/ELSE expressions to find referenced columns
+2. Build a projection vector containing only those column indices
+3. Derive new versions of the expressions with updated column references
+
+For example, if the original CASE references columns at indices `[1, 5, 8]` in 
a 20-column batch:

Review Comment:
   I can imagine a nice diagram for this section too where it shows rewriting 
the expression to only refer to the three columns, and then rewriting the input 
to a new three column form. Not necessary I am just thinking



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.
+- **Scattering**: `PhysicalExpr::evaluate_selection` scatters the filtered 
result back to the original `RecordBatch` length.
+- **Merging**: The `zip` kernel is called once per branch to merge partial 
results into the output array
+
+Each of these steps allocates new arrays and shuffles a lot of data around. 
+
+### Problem 3: Filtering Unused Columns
+
+The `PhysicalExpr::evaluate_selection` method filters the entire record batch, 
including columns that the current branch's `WHEN` and `THEN` expressions don't 
reference.
+For wide tables (many columns) with narrow expressions (few column 
references), this is wasteful.
+
+Suppose we have a table with 26 columns named `a` through `z`.
+For a simple CASE expression like:
+
+```sql
+CASE
+  WHEN a > 1000 THEN 'large'
+  WHEN a >= 0 THEN 'positive'
+  ELSE 'negative'
+END
+```
+
+the implementation would filter all columns 26 columns even though only a 
single column is needed for the entire `CASE` expression evaluation.
+Again this involves a non-negligible amount of allocation and data copying.
+
+## Performance Improvements
+
+### Optimization 1: Short-Circuit Early Exit (commit 7c215ed)
+
+The first optimization added early exit logic to the evaluation loop:
+
+```rust
+let mut remainder_count = batch.num_rows();
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    if remainder_count == 0 {
+        break;  // All rows matched, exit early
+    }
+
+    // ... evaluate branch ...
+
+    let when_match_count = when_value.true_count();
+    remainder_count -= when_match_count;
+}
+```
+
+Additionally, we avoid evaluating the `ELSE` clause when no rows remain:
+
+```rust
+if let Some(else_expr) = &self.else_expr {
+    remainder = or(&base_nulls, &remainder)?;
+    if remainder.true_count() > 0 {
+        // ... evaluate else ...
+    }
+}
+```
+
+**Impact**: For queries where early branches match all rows, this eliminates 
unnecessary branch evaluations and `ELSE` clause processing.
+
+### Optimization 2: Optimized Result Merging (commit e9431fc)
+
+The second optimization fundamentally restructured how partial results are 
merged.
+Instead of using `zip()` after each branch to merge results into an output 
array, we now:
+
+1. Maintain the subset of rows still needing evaluation across loop iterations
+2. Filter the batch progressively as rows are matched
+3. Build an index structure that tracks which branch produced each row's result
+4. Perform a single merge operation at the end
+
+The key insight is that we can defer all merging until the end of the 
evaluation loop by tracking result provenance.
+When a branch matches a number of rows, instead of immediately merging with 
`zip()`, we:
+1. Store the partial result array
+2. Mark the cells corresponding to each row in an indices array as needing to 
take one value from the partial result array
+
+In the example below, three `WHEN/THEN` branches produced results.
+The first branch produced the result `A` for 2, the second produced `B` for 
row 1, and the third produced `C` and `D` for rows 4 and 5.
+The final result array is obtained by running the arrays through the merge 
operation.
+
+```aiignore
+┌───────────┐  ┌─────────┐                             ┌─────────┐
+│┌─────────┐│  │   None  │                             │   NULL  │
+││    A    ││  ├─────────┤                             ├─────────┤
+│└─────────┘│  │    1    │                             │    B    │
+│┌─────────┐│  ├─────────┤                             ├─────────┤
+││    B    ││  │    0    │   merge_n(values, indices)  │    A    │
+│└─────────┘│  ├─────────┤  ─────────────────────────▶ ├─────────┤
+│┌─────────┐│  │   None  │                             │   NULL  │
+││    C    ││  ├─────────┤                             ├─────────┤
+│├─────────┤│  │    2    │                             │    C    │
+││    D    ││  ├─────────┤                             ├─────────┤
+│└─────────┘│  │    2    │                             │    D    │
+└───────────┘  └─────────┘                             └─────────┘
+   arrays        indices                                  result
+```
+
+The main benefits of this merge operation are that the `scatter` step is 
eleminated enitrely, and instead of requiring
+a `zip` per branch only a single `merge_n` is done.
+
+Besides more efficient merging, we also maintain a progressively filtered 
`remainder_batch`.
+
+```rust
+let mut remainder_batch = Cow::Borrowed(batch);
+let mut remainder_rows = UInt32Array::from_iter_values(0..batch.num_rows());
+
+for branch in branches {
+    // Evaluate on progressively smaller batch
+    let when_value = evaluate_on(&remainder_batch)?;
+
+    // Filter remainder_batch for next iteration
+    let next_filter = create_filter(&not(&when_value)?);
+    remainder_batch = Cow::Owned(filter_record_batch(&remainder_batch, 
&next_filter)?);
+    remainder_rows = filter_array(&remainder_rows, &next_filter)?;
+}
+```
+
+As rows are matched, the `remainder_batch` shrinks, making subsequent filter 
operations a bit faster.
+
+**Impact**: This eliminates N-1 merge operations and makes filtering 
progressively cheaper as branches match rows.
+
+This new operation was initially developed specifically for DataFusion's 
`CASE` evaluation, but in the meantime has been generalized and moved into the 
`arrow-rs` crate as 
[`arrow_select::merge::merge_n`](https://docs.rs/arrow-select/57.1.0/arrow_select/merge/fn.merge_n.html).
+
+### Optimization 3: Column Projection (commit c3e49fb)
+
+The third optimization addresses the "filtering unused columns" problem 
through projection.
+Before evaluating a CASE expression, we:
+
+1. Analyze all WHEN/THEN/ELSE expressions to find referenced columns
+2. Build a projection vector containing only those column indices
+3. Derive new versions of the expressions with updated column references
+
+For example, if the original CASE references columns at indices `[1, 5, 8]` in 
a 20-column batch:
+- Project the batch to a 3-column batch with those columns
+- Rewrite expressions from `col@1, col@5, col@8` to `col@0, col@1, col@2`
+- Evaluate using the projected batch
+
+This is encapsulated in a `ProjectedCaseBody` structure:
+
+```rust
+struct ProjectedCaseBody {
+    projection: Vec<usize>,     // [1, 5, 8]
+    body: CaseBody,             // Expressions with rewritten column indices
+}
+```
+
+The projection logic is only applied when it would be beneficial (i.e., when 
the number of used columns is lower than the total number of columns in the 
batch).
+
+**Impact**: For wide tables with narrow CASE expressions, this dramatically 
reduces filtering overhead by removing copying of unused columns.
+
+### Optimization 4: Eliminating Scatter in Two-Branch Case (commit 32d2618)
+
+The final optimization targets a common pattern: `CASE WHEN condition THEN 
expr1 ELSE expr2 END`.
+
+Previously, this used the specialized `ExpressionOrExpression` fast path, but 
still used `evaluate_selection()` which produces scattered results:
+
+```rust
+// Old approach: produces scattered array with batch.num_rows() elements
+let then_value = then_expr.evaluate_selection(batch, &when_value)?
+    .into_array(batch.num_rows())?;
+```
+
+The new approach filters the batch first, then evaluates:
+
+```rust
+// New approach: filter to only matching rows
+let when_filter = create_filter(&when_value);
+let then_batch = filter_record_batch(batch, &when_filter)?;
+let then_value = then_expr.evaluate(&then_batch)?;  // Produces compact array
+
+let else_filter = create_filter(&not(&when_value)?);
+let else_batch = filter_record_batch(batch, &else_filter)?;
+let else_value = else_expr.evaluate(&else_batch)?;
+```
+
+This produces two compact arrays (one for THEN values, one for ELSE values) 
which are then merged with a custom merge function that doesn't require 
pre-scattered inputs:
+
+```rust
+fn merge(mask: &BooleanArray, truthy: ColumnarValue, falsy: ColumnarValue) -> 
ArrayRef {
+    // Interleave truthy and falsy values directly using mask
+    // without requiring pre-alignment
+}
+```
+
+**Impact**: This eliminates unnecessary scatter operations and memory 
allocations for one of the most common CASE expression patterns.
+
+Just like `merge_n` this operation has been moved into `arrow-rs` as 
[`arrow_select::merge::merge`](https://docs.rs/arrow-select/57.1.0/arrow_select/merge/fn.merge.html).
+
+## Summary
+
+Through four targeted optimizations, we've transformed CASE expression 
evaluation from a simple but inefficient implementation to a highly optimized 
one that:
+
+1. **Exits early** when all rows are matched
+2. **Defers merging** until the end with a single interleave operation
+3. **Projects columns** to avoid filtering unused data
+4. **Eliminates scatter** operations in common patterns
+
+These improvements compound: a CASE expression on a wide table with multiple 
branches and early matches benefits from all four optimizations simultaneously.

Review Comment:
   I think it would be stronger here to refer to the actual performance numbers 
(see suggestion on introduction) rather than generalizations. 



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.
+- **Scattering**: `PhysicalExpr::evaluate_selection` scatters the filtered 
result back to the original `RecordBatch` length.
+- **Merging**: The `zip` kernel is called once per branch to merge partial 
results into the output array
+
+Each of these steps allocates new arrays and shuffles a lot of data around. 
+
+### Problem 3: Filtering Unused Columns
+
+The `PhysicalExpr::evaluate_selection` method filters the entire record batch, 
including columns that the current branch's `WHEN` and `THEN` expressions don't 
reference.
+For wide tables (many columns) with narrow expressions (few column 
references), this is wasteful.
+
+Suppose we have a table with 26 columns named `a` through `z`.
+For a simple CASE expression like:
+
+```sql
+CASE
+  WHEN a > 1000 THEN 'large'
+  WHEN a >= 0 THEN 'positive'
+  ELSE 'negative'
+END
+```
+
+the implementation would filter all columns 26 columns even though only a 
single column is needed for the entire `CASE` expression evaluation.
+Again this involves a non-negligible amount of allocation and data copying.
+
+## Performance Improvements
+
+### Optimization 1: Short-Circuit Early Exit (commit 7c215ed)

Review Comment:
   The same suggestion applies to the next few sections too



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.
+- **Scattering**: `PhysicalExpr::evaluate_selection` scatters the filtered 
result back to the original `RecordBatch` length.
+- **Merging**: The `zip` kernel is called once per branch to merge partial 
results into the output array
+
+Each of these steps allocates new arrays and shuffles a lot of data around. 
+
+### Problem 3: Filtering Unused Columns
+
+The `PhysicalExpr::evaluate_selection` method filters the entire record batch, 
including columns that the current branch's `WHEN` and `THEN` expressions don't 
reference.
+For wide tables (many columns) with narrow expressions (few column 
references), this is wasteful.
+
+Suppose we have a table with 26 columns named `a` through `z`.
+For a simple CASE expression like:
+
+```sql
+CASE
+  WHEN a > 1000 THEN 'large'
+  WHEN a >= 0 THEN 'positive'
+  ELSE 'negative'
+END
+```
+
+the implementation would filter all columns 26 columns even though only a 
single column is needed for the entire `CASE` expression evaluation.
+Again this involves a non-negligible amount of allocation and data copying.
+
+## Performance Improvements
+
+### Optimization 1: Short-Circuit Early Exit (commit 7c215ed)

Review Comment:
   Having the commit in the title was a little strange (as it wasn't a link)
   
   If you want to leave the commit, I recommend removing it from the title and 
adding a sentence w/ a link 
   
   So instead of
   
   > The first optimization added early exit logic to the evaluation loop:
   
   Maybe something like
   
   > The first optimization (commit [7c215ed]) added early exit logic to the 
evaluation loop:
   
   
   [7c215ed]: 
https://github.com/apache/datafusion/commit/7c215ed8dc850bcc2632fa5183ccdc58dd6491e7



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation

Review Comment:
   I recommend also adding the term SQL to make the title more specific
   
   ```suggestion
   title: Optimizing SQL `CASE` Expression Evaluation
   ```
   
   



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.

Review Comment:
   I didn't see any section about future improvements 🤔 



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.

Review Comment:
   I suggest referring to DataFusion with its full name on first reference (and 
a link so people can go back there when needed):
   
   ```suggestion
   Over the past few releases, we've landed a series of improvements to [Apache 
DataFusion]'s `CASE` expression evaluator that reduce both CPU time and memory 
allocations.
   ```



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit

Review Comment:
   putting on my "emphasize the positive" hats, I suggest using the term 
"Opportunity" (so "Opportunity 1: No Early Exit" rather than problem :)
   
   Likewise for the steps below



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.

Review Comment:
   I think we need whitespace between these items so that the pelican blog 
software renders them on different lines:
   
   <img width="1130" height="312" alt="Image" 
src="https://github.com/user-attachments/assets/758df255-c5d9-4fa6-aaac-b9009ca56357";
 />



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>

Review Comment:
   The headings seemed strange to me -- I suggest demote them all one level: 
   
   <img width="435" height="914" alt="Image" 
src="https://github.com/user-attachments/assets/285d3e1d-995a-48ca-b098-376781b1b8a0";
 />



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask

Review Comment:
   I recommend adding links to these functions, 
   
   * PhysicalExpr::evaluate_selection: 
https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html#method.evaluate_selection
   * `zip`: https://docs.rs/arrow/latest/arrow/compute/kernels/zip/fn.zip.html



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.

Review Comment:
   I think this section would be easier to follow if it had a diagram showing 
the steps -- I am not sure if you have a visual in your mind but we might be 
able to come up with something that visually shows the improvements the blogs 
describes
   
   
   
   



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.

Review Comment:
   I recommend also adding this context to the rust code itself, as comments on 
the appropriate line
   
   A link / description of what `scatter` does would also be helpful



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.
+- **Scattering**: `PhysicalExpr::evaluate_selection` scatters the filtered 
result back to the original `RecordBatch` length.
+- **Merging**: The `zip` kernel is called once per branch to merge partial 
results into the output array
+
+Each of these steps allocates new arrays and shuffles a lot of data around. 
+
+### Problem 3: Filtering Unused Columns
+
+The `PhysicalExpr::evaluate_selection` method filters the entire record batch, 
including columns that the current branch's `WHEN` and `THEN` expressions don't 
reference.
+For wide tables (many columns) with narrow expressions (few column 
references), this is wasteful.
+
+Suppose we have a table with 26 columns named `a` through `z`.
+For a simple CASE expression like:
+
+```sql
+CASE
+  WHEN a > 1000 THEN 'large'
+  WHEN a >= 0 THEN 'positive'
+  ELSE 'negative'
+END
+```
+
+the implementation would filter all columns 26 columns even though only a 
single column is needed for the entire `CASE` expression evaluation.
+Again this involves a non-negligible amount of allocation and data copying.
+
+## Performance Improvements
+
+### Optimization 1: Short-Circuit Early Exit (commit 7c215ed)
+
+The first optimization added early exit logic to the evaluation loop:
+
+```rust
+let mut remainder_count = batch.num_rows();
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    if remainder_count == 0 {
+        break;  // All rows matched, exit early
+    }
+
+    // ... evaluate branch ...
+
+    let when_match_count = when_value.true_count();
+    remainder_count -= when_match_count;
+}
+```
+
+Additionally, we avoid evaluating the `ELSE` clause when no rows remain:
+
+```rust
+if let Some(else_expr) = &self.else_expr {
+    remainder = or(&base_nulls, &remainder)?;
+    if remainder.true_count() > 0 {
+        // ... evaluate else ...
+    }
+}
+```
+
+**Impact**: For queries where early branches match all rows, this eliminates 
unnecessary branch evaluations and `ELSE` clause processing.
+
+### Optimization 2: Optimized Result Merging (commit e9431fc)
+
+The second optimization fundamentally restructured how partial results are 
merged.
+Instead of using `zip()` after each branch to merge results into an output 
array, we now:
+
+1. Maintain the subset of rows still needing evaluation across loop iterations
+2. Filter the batch progressively as rows are matched
+3. Build an index structure that tracks which branch produced each row's result
+4. Perform a single merge operation at the end
+
+The key insight is that we can defer all merging until the end of the 
evaluation loop by tracking result provenance.
+When a branch matches a number of rows, instead of immediately merging with 
`zip()`, we:
+1. Store the partial result array
+2. Mark the cells corresponding to each row in an indices array as needing to 
take one value from the partial result array
+
+In the example below, three `WHEN/THEN` branches produced results.
+The first branch produced the result `A` for 2, the second produced `B` for 
row 1, and the third produced `C` and `D` for rows 4 and 5.
+The final result array is obtained by running the arrays through the merge 
operation.
+
+```aiignore
+┌───────────┐  ┌─────────┐                             ┌─────────┐
+│┌─────────┐│  │   None  │                             │   NULL  │
+││    A    ││  ├─────────┤                             ├─────────┤
+│└─────────┘│  │    1    │                             │    B    │
+│┌─────────┐│  ├─────────┤                             ├─────────┤
+││    B    ││  │    0    │   merge_n(values, indices)  │    A    │
+│└─────────┘│  ├─────────┤  ─────────────────────────▶ ├─────────┤
+│┌─────────┐│  │   None  │                             │   NULL  │
+││    C    ││  ├─────────┤                             ├─────────┤
+│├─────────┤│  │    2    │                             │    C    │
+││    D    ││  ├─────────┤                             ├─────────┤
+│└─────────┘│  │    2    │                             │    D    │
+└───────────┘  └─────────┘                             └─────────┘
+   arrays        indices                                  result
+```
+
+The main benefits of this merge operation are that the `scatter` step is 
eleminated enitrely, and instead of requiring
+a `zip` per branch only a single `merge_n` is done.
+
+Besides more efficient merging, we also maintain a progressively filtered 
`remainder_batch`.
+
+```rust
+let mut remainder_batch = Cow::Borrowed(batch);
+let mut remainder_rows = UInt32Array::from_iter_values(0..batch.num_rows());
+
+for branch in branches {
+    // Evaluate on progressively smaller batch
+    let when_value = evaluate_on(&remainder_batch)?;
+
+    // Filter remainder_batch for next iteration
+    let next_filter = create_filter(&not(&when_value)?);
+    remainder_batch = Cow::Owned(filter_record_batch(&remainder_batch, 
&next_filter)?);
+    remainder_rows = filter_array(&remainder_rows, &next_filter)?;
+}
+```
+
+As rows are matched, the `remainder_batch` shrinks, making subsequent filter 
operations a bit faster.
+
+**Impact**: This eliminates N-1 merge operations and makes filtering 
progressively cheaper as branches match rows.
+
+This new operation was initially developed specifically for DataFusion's 
`CASE` evaluation, but in the meantime has been generalized and moved into the 
`arrow-rs` crate as 
[`arrow_select::merge::merge_n`](https://docs.rs/arrow-select/57.1.0/arrow_select/merge/fn.merge_n.html).
+
+### Optimization 3: Column Projection (commit c3e49fb)
+
+The third optimization addresses the "filtering unused columns" problem 
through projection.
+Before evaluating a CASE expression, we:
+
+1. Analyze all WHEN/THEN/ELSE expressions to find referenced columns
+2. Build a projection vector containing only those column indices
+3. Derive new versions of the expressions with updated column references
+
+For example, if the original CASE references columns at indices `[1, 5, 8]` in 
a 20-column batch:
+- Project the batch to a 3-column batch with those columns
+- Rewrite expressions from `col@1, col@5, col@8` to `col@0, col@1, col@2`
+- Evaluate using the projected batch
+
+This is encapsulated in a `ProjectedCaseBody` structure:
+
+```rust
+struct ProjectedCaseBody {
+    projection: Vec<usize>,     // [1, 5, 8]
+    body: CaseBody,             // Expressions with rewritten column indices
+}
+```
+
+The projection logic is only applied when it would be beneficial (i.e., when 
the number of used columns is lower than the total number of columns in the 
batch).
+
+**Impact**: For wide tables with narrow CASE expressions, this dramatically 
reduces filtering overhead by removing copying of unused columns.
+
+### Optimization 4: Eliminating Scatter in Two-Branch Case (commit 32d2618)
+
+The final optimization targets a common pattern: `CASE WHEN condition THEN 
expr1 ELSE expr2 END`.
+
+Previously, this used the specialized `ExpressionOrExpression` fast path, but 
still used `evaluate_selection()` which produces scattered results:
+
+```rust
+// Old approach: produces scattered array with batch.num_rows() elements
+let then_value = then_expr.evaluate_selection(batch, &when_value)?
+    .into_array(batch.num_rows())?;
+```
+
+The new approach filters the batch first, then evaluates:
+
+```rust
+// New approach: filter to only matching rows
+let when_filter = create_filter(&when_value);
+let then_batch = filter_record_batch(batch, &when_filter)?;
+let then_value = then_expr.evaluate(&then_batch)?;  // Produces compact array
+
+let else_filter = create_filter(&not(&when_value)?);
+let else_batch = filter_record_batch(batch, &else_filter)?;
+let else_value = else_expr.evaluate(&else_batch)?;
+```
+
+This produces two compact arrays (one for THEN values, one for ELSE values) 
which are then merged with a custom merge function that doesn't require 
pre-scattered inputs:
+
+```rust
+fn merge(mask: &BooleanArray, truthy: ColumnarValue, falsy: ColumnarValue) -> 
ArrayRef {
+    // Interleave truthy and falsy values directly using mask
+    // without requiring pre-alignment
+}
+```
+
+**Impact**: This eliminates unnecessary scatter operations and memory 
allocations for one of the most common CASE expression patterns.
+
+Just like `merge_n` this operation has been moved into `arrow-rs` as 
[`arrow_select::merge::merge`](https://docs.rs/arrow-select/57.1.0/arrow_select/merge/fn.merge.html).
+
+## Summary
+
+Through four targeted optimizations, we've transformed CASE expression 
evaluation from a simple but inefficient implementation to a highly optimized 
one that:
+
+1. **Exits early** when all rows are matched
+2. **Defers merging** until the end with a single interleave operation
+3. **Projects columns** to avoid filtering unused data
+4. **Eliminates scatter** operations in common patterns
+
+These improvements compound: a CASE expression on a wide table with multiple 
branches and early matches benefits from all four optimizations simultaneously.
+The result is significantly reduced CPU time and memory allocation in one of 
SQL's most frequently used constructs.
+
+All of these changes maintain full compatibility with DataFusion's existing 
semantics and are covered by the existing test suite, demonstrating that 
performance optimization doesn't have to compromise correctness or require API 
changes.

Review Comment:
   I think this is a fairly obvious conclusion so I think we could remove it 
without much loss of content
   
   Maybe we could also throw a shortened version of the standard footer / how 
to get involved sections?
   
   
https://datafusion.apache.org/blog/output/2025/11/25/datafusion-51.0.0/#about-datafusion
   
https://datafusion.apache.org/blog/output/2025/11/25/datafusion-51.0.0/#how-to-get-involved



##########
content/blog/2025-11-11-datafusion_case.md:
##########
@@ -0,0 +1,377 @@
+---
+layout: post
+title: Optimizing CASE Expression Evaluation
+date: 2025-11-11
+author: Pepijn Van Eeckhoudt
+categories: [features]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+<style>
+figure {
+  margin: 20px 0;
+}
+
+figure img {
+  display: block;
+  max-width: 80%;
+  margin: auto;
+}
+
+figcaption {
+  font-style: italic;
+  color: #555;
+  font-size: 0.9em;
+  max-width: 80%;
+  margin: auto;
+  text-align: center;
+}
+</style>
+
+# Optimizing CASE Expression Evaluation in DataFusion
+
+SQL's `CASE` expression is one of the few constructs the language provides to 
perform conditional logic.
+Its deceptively simple syntax hides significant implementation complexity.
+Over the past few weeks, we've landed a series of improvements to DataFusion's 
`CASE` expression evaluator that reduce both CPU time and memory allocations.
+This post walks through the original implementation, its performance 
bottlenecks, and how we addressed them step by step.
+Finally we'll also take a look at some future improvements to `CASE` that are 
in the works.
+
+## Background: CASE Expression Evaluation
+
+SQL supports two forms of CASE expressions:
+
+1. **Simple**: `CASE expr WHEN value1 THEN result1 WHEN value2 THEN result2 
... END`
+2. **Searched**: `CASE WHEN condition1 THEN result1 WHEN condition2 THEN 
result2 ... END`
+
+The simple form evaluates an expression once for each input row and then tests 
that value against the constants (or expressions) in each `WHEN` clause using 
equality comparisons.
+Think of it as a limited Rust `match` expression.
+
+Here's a simple example:
+
+```sql
+CASE status
+    WHEN 'pending' THEN 1
+    WHEN 'active' THEN 2
+    WHEN 'complete' THEN 3
+    ELSE 0
+END
+```
+
+In this `CASE` expression, `status` is evaluated once per row, and then its 
value is tested for equality with the values `'pending'`, `'active'`, and 
`'complete'` in that order.
+The `THEN` expression value for the first matching `WHEN` expression is 
returned per row.
+
+The searched `CASE` form is a more flexible variant.
+It evaluates completely independent boolean expressions for each branch.
+
+This allows you to test different columns with different operators per branch 
as can be seen in the following example:
+
+```sql
+CASE
+    WHEN age > 65 THEN 'senior'
+    WHEN childCount != 0 THEN 'parent'
+    WHEN age < 21 THEN 'minor'
+    ELSE 'adult'
+END
+```
+
+In both forms, branches are evaluated sequentially with short-circuit 
semantics: for each row, once a `WHEN` condition matches, the corresponding 
`THEN` expression is evaluated. Any further branches are not evaluated for that 
row.
+This lazy evaluation model is critical for correctness.
+It let's you safely write `CASE` expressions like
+
+```sql
+CASE
+    WHEN denominator == 0 THEN NULL
+    ELSE nominator / denominator
+END
+```
+
+that are guaranteed to not trigger divide-by-zero errors.
+
+## `CASE` Evaluation in DataFusion 50.0.0
+
+For the rest of this post we'll be looking at 'searched case' evaluation.
+'Simple case' uses a distinct, but very similar implementation.
+The same set of improvements has been applied to both.
+
+The baseline implementation in DataFusion 50.0.0 evaluated `CASE` using a 
straightforward approach:
+
+1. Start with an output array `out` with the same length as the input batch, 
filled with nulls. Additionally, create a bit vector `remainder` with the same 
length and each value set to `true`.
+2. For each `WHEN`/`THEN` branch:
+  - Evaluate the `WHEN` condition for remaining unmatched rows using 
`PhysicalExpr::evaluate_selection`, passing in the input batch and the 
`remainder` mask
+  - If any rows matched, evaluate the `THEN` expression for those rows using 
`PhysicalExpr::evaluate_selection`
+  - Merge the results into the `out` using the `zip` kernel
+  - Update the `remainder` mask to exclude matched rows
+3. If there's an `ELSE` clause, evaluate it for any remaining unmatched rows 
and merge using `zip`
+
+Here's a simplified version of the original loop:
+
+```rust
+let mut current_value = new_null_array(&return_type, batch.num_rows());
+let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    let when_value = when_expr.evaluate_selection(batch, &remainder)?
+        .into_array(batch.num_rows())?;
+    let when_value = and(&when_value, &remainder)?;
+
+    if when_value.true_count() == 0 {
+        continue;
+    }
+
+    let then_value = then_expr.evaluate_selection(batch, &when_value)?;
+    current_value = zip(&when_value, &then_value, &current_value)?;
+    remainder = and_not(&remainder, &when_value)?;
+}
+```
+
+While correct, this implementation had several performance issues mostly 
related to the usage of `evaluate_selection`.
+To understand why, we need to dig a little deeper into the implementation of 
that function.
+Here's a simplified version of it that captures the relevant parts:
+
+```rust
+pub trait PhysicalExpr {
+    fn evaluate_selection(
+        &self,
+        batch: &RecordBatch,
+        selection: &BooleanArray,
+    ) -> Result<ColumnarValue> {
+        let filtered_batch = filter_record_batch(batch, selection)?;
+        let filtered_result = self.evaluate(&filtered_batch)?;
+        scatter(selection, filtered_result)
+    }
+}
+```
+
+The `evaluate_selection` method first filters the input batch to only include 
rows that match the `selection` mask.
+It then calls the regular `evaluate` method using the filtered batch as input.
+Finally, to return a result array with the same number of rows as `batch`, the 
`scatter` function is called.
+This function produces a new array padded with `null` values for any rows that 
didn't match the `selection` mask.
+
+So how does the simple evaluation strategy and use of `evaluate_selection` 
cause performance overhead?
+
+### Problem 1: No Early Exit
+
+The case evaluation loop always iterated through all branches, even when every 
row had already been matched.
+In queries where early branches match many rows, this meant unnecessary work 
was done for remaining rows.
+
+### Problem 2: Repeated Filtering, Scattering, and Merging
+
+Each iteration performed operations that are very well-optimized but still not 
cost free to execute:
+- **Filtering**: `PhysicalExpr::evaluate_selection` filters the entire 
`RecordBatch` (all columns) for each branch. For the `WHEN` expression, this 
was done even if the selection mask was entirely empty.
+- **Scattering**: `PhysicalExpr::evaluate_selection` scatters the filtered 
result back to the original `RecordBatch` length.
+- **Merging**: The `zip` kernel is called once per branch to merge partial 
results into the output array
+
+Each of these steps allocates new arrays and shuffles a lot of data around. 
+
+### Problem 3: Filtering Unused Columns
+
+The `PhysicalExpr::evaluate_selection` method filters the entire record batch, 
including columns that the current branch's `WHEN` and `THEN` expressions don't 
reference.
+For wide tables (many columns) with narrow expressions (few column 
references), this is wasteful.
+
+Suppose we have a table with 26 columns named `a` through `z`.
+For a simple CASE expression like:
+
+```sql
+CASE
+  WHEN a > 1000 THEN 'large'
+  WHEN a >= 0 THEN 'positive'
+  ELSE 'negative'
+END
+```
+
+the implementation would filter all columns 26 columns even though only a 
single column is needed for the entire `CASE` expression evaluation.
+Again this involves a non-negligible amount of allocation and data copying.
+
+## Performance Improvements
+
+### Optimization 1: Short-Circuit Early Exit (commit 7c215ed)
+
+The first optimization added early exit logic to the evaluation loop:
+
+```rust
+let mut remainder_count = batch.num_rows();
+
+for (when_expr, then_expr) in &self.when_then_expr {
+    if remainder_count == 0 {
+        break;  // All rows matched, exit early
+    }
+
+    // ... evaluate branch ...
+
+    let when_match_count = when_value.true_count();
+    remainder_count -= when_match_count;
+}
+```
+
+Additionally, we avoid evaluating the `ELSE` clause when no rows remain:
+
+```rust
+if let Some(else_expr) = &self.else_expr {
+    remainder = or(&base_nulls, &remainder)?;
+    if remainder.true_count() > 0 {
+        // ... evaluate else ...
+    }
+}
+```
+
+**Impact**: For queries where early branches match all rows, this eliminates 
unnecessary branch evaluations and `ELSE` clause processing.
+
+### Optimization 2: Optimized Result Merging (commit e9431fc)
+
+The second optimization fundamentally restructured how partial results are 
merged.
+Instead of using `zip()` after each branch to merge results into an output 
array, we now:
+
+1. Maintain the subset of rows still needing evaluation across loop iterations
+2. Filter the batch progressively as rows are matched
+3. Build an index structure that tracks which branch produced each row's result
+4. Perform a single merge operation at the end
+
+The key insight is that we can defer all merging until the end of the 
evaluation loop by tracking result provenance.
+When a branch matches a number of rows, instead of immediately merging with 
`zip()`, we:
+1. Store the partial result array
+2. Mark the cells corresponding to each row in an indices array as needing to 
take one value from the partial result array
+
+In the example below, three `WHEN/THEN` branches produced results.
+The first branch produced the result `A` for 2, the second produced `B` for 
row 1, and the third produced `C` and `D` for rows 4 and 5.
+The final result array is obtained by running the arrays through the merge 
operation.
+
+```aiignore

Review Comment:
   what does 'aiignore' mean ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add blog post regarding CASE work [datafusion-site]

Reply via email to