pepijnve commented on code in PR #18183:
URL: https://github.com/apache/datafusion/pull/18183#discussion_r2490838599


##########
datafusion/physical-expr/src/expressions/case/mod.rs:
##########
@@ -15,9 +15,9 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use super::{Column, Literal};
-use crate::expressions::case::ResultState::{Complete, Empty, Partial};
-use crate::expressions::try_cast;
+mod literal_lookup_table;

Review Comment:
   I asked a bit and the preference seems to be to avoid mod.rs files with lots 
of code in it. Could we move this file back to case.rs?



##########
datafusion/physical-expr/src/expressions/case/mod.rs:
##########
@@ -1031,6 +1091,9 @@ impl PhysicalExpr for CaseExpr {
             }
             EvalMethod::ScalarOrScalar => self.scalar_or_scalar(batch),
             EvalMethod::ExpressionOrExpression => self.expr_or_expr(batch),
+            EvalMethod::WithExprScalarLookupTable(ref e) => {

Review Comment:
   nit: maybe `t` for the table rather than `e`



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;

Review Comment:
   Similarly to case I think it might be a good idea to move the bulk of this 
code out of mod.rs



##########
datafusion/physical-expr/src/expressions/case/mod.rs:
##########
@@ -69,8 +71,27 @@ enum EvalMethod {
     ///
     /// CASE WHEN condition THEN expression ELSE expression END
     ExpressionOrExpression,
+
+    /// This is a specialization for [`EvalMethod::WithExpression`] when the 
value and results are literals
+    ///
+    /// See [`LiteralLookupTable`] for more details
+    WithExprScalarLookupTable(LiteralLookupTable),
+}
+
+// Implement empty hash as the data is derived from PhysicalExprs which are 
already hashed
+impl Hash for LiteralLookupTable {

Review Comment:
   This may be my lack of familiarity with Rust idioms, but it took me a while 
to understand why these empty implementations were actually here. The intention 
is to exclude `LiteralLookupTable` from the `#[derive]` generated 
implementations of these traits for `EvalMethod`.
   Perhaps a comment stating that would be useful for future readers?



##########
datafusion/physical-expr/src/expressions/case/mod.rs:
##########
@@ -1954,4 +2018,805 @@ mod tests {
 
         Ok(())
     }
+
+    // Test Lookup evaluation
+
+    enum AssertLookupEvaluation {

Review Comment:
   Would it make sense to turn most of these tests into SLTs? This is a 
personal preference thing, but in general I try to avoid writing tests that 
ossify implementation details. Further refinements to the lookup table logic 
(e.g., when it is or is not used) would require test changes. It's probably 
more important to assert query correctness rather than if the correct code path 
is or is not used.



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {

Review Comment:
   Would be clearer if the declared variable is named `else_value`



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(

Review Comment:
   `create_output` feels a bit too nondescript in the context of a lookup table



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {

Review Comment:
   Maybe call these 'when' and 'then' values rather than literals? Based on the 
naming pattern established above you're led to think these are `Vec<Literlal>`s



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(
+        &self,
+        expr_array: &ArrayRef,
+    ) -> datafusion_common::Result<ArrayRef> {
+        let take_indices = self.lookup.match_values(expr_array)?;
+
+        // Zero-copy conversion
+        let take_indices = Int32Array::from(take_indices);
+
+        // An optimize version would depend on the type of the 
values_to_take_from
+        // For example, if the type is view we can just keep pointing to the 
same value (similar to dictionary)
+        // if the type is dictionary we can just use the indices as is (or 
cast them to the key type) and create a new dictionary array
+        let output = arrow::compute::take(&self.values_to_take_from, 
&take_indices, None)
+            .map_err(|e| arrow_datafusion_err!(e))?;
+
+        Ok(output)
+    }
+}
+
+/// Lookup table for mapping literal values to their corresponding indices in 
the THEN clauses
+///
+/// The else index is used when a value is not found in the lookup table
+pub(super) trait WhenLiteralIndexMap: Debug + Send + Sync {
+    /// Try creating a new lookup table from the given literals and else index
+    ///
+    /// `literals` are guaranteed to be unique and non-nullable
+    fn try_new(
+        unique_non_null_literals: Vec<ScalarValue>,
+        else_index: i32,
+    ) -> datafusion_common::Result<Self>
+    where
+        Self: Sized;
+
+    /// Return indices to take from the literals based on the values in the 
given array
+    fn match_values(&self, array: &ArrayRef) -> 
datafusion_common::Result<Vec<i32>>;

Review Comment:
   Maybe `map_to_indices` or something along those lines would make this more 
self explanatory?



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(
+        &self,
+        expr_array: &ArrayRef,
+    ) -> datafusion_common::Result<ArrayRef> {
+        let take_indices = self.lookup.match_values(expr_array)?;
+
+        // Zero-copy conversion
+        let take_indices = Int32Array::from(take_indices);
+
+        // An optimize version would depend on the type of the 
values_to_take_from
+        // For example, if the type is view we can just keep pointing to the 
same value (similar to dictionary)
+        // if the type is dictionary we can just use the indices as is (or 
cast them to the key type) and create a new dictionary array
+        let output = arrow::compute::take(&self.values_to_take_from, 
&take_indices, None)
+            .map_err(|e| arrow_datafusion_err!(e))?;
+
+        Ok(output)
+    }
+}
+
+/// Lookup table for mapping literal values to their corresponding indices in 
the THEN clauses
+///
+/// The else index is used when a value is not found in the lookup table
+pub(super) trait WhenLiteralIndexMap: Debug + Send + Sync {
+    /// Try creating a new lookup table from the given literals and else index
+    ///
+    /// `literals` are guaranteed to be unique and non-nullable
+    fn try_new(
+        unique_non_null_literals: Vec<ScalarValue>,
+        else_index: i32,

Review Comment:
   I think it might be easier to understand if this is passed in as a 
`default_value` argument to `WhenLiteralIndexMap::match_value`.



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(
+        &self,
+        expr_array: &ArrayRef,
+    ) -> datafusion_common::Result<ArrayRef> {
+        let take_indices = self.lookup.match_values(expr_array)?;
+
+        // Zero-copy conversion
+        let take_indices = Int32Array::from(take_indices);
+
+        // An optimize version would depend on the type of the 
values_to_take_from
+        // For example, if the type is view we can just keep pointing to the 
same value (similar to dictionary)
+        // if the type is dictionary we can just use the indices as is (or 
cast them to the key type) and create a new dictionary array
+        let output = arrow::compute::take(&self.values_to_take_from, 
&take_indices, None)
+            .map_err(|e| arrow_datafusion_err!(e))?;
+
+        Ok(output)
+    }
+}
+
+/// Lookup table for mapping literal values to their corresponding indices in 
the THEN clauses
+///
+/// The else index is used when a value is not found in the lookup table
+pub(super) trait WhenLiteralIndexMap: Debug + Send + Sync {
+    /// Try creating a new lookup table from the given literals and else index
+    ///
+    /// `literals` are guaranteed to be unique and non-nullable
+    fn try_new(
+        unique_non_null_literals: Vec<ScalarValue>,

Review Comment:
   It might be worth restating in the comments here that the index of the 
values in the `Vec` is used as the mapped value. `unique_non_null_literals` 
doesn't really convey that.
   
   Something that tells me as a reader that these are the `when` values would 
also be useful.



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/primitive_lookup_table.rs:
##########
@@ -0,0 +1,185 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::expressions::case::literal_lookup_table::WhenLiteralIndexMap;
+use arrow::array::{Array, ArrayRef, ArrowNativeTypeOp, ArrowPrimitiveType, 
AsArray};
+use arrow::datatypes::{i256, IntervalDayTime, IntervalMonthDayNano};
+use datafusion_common::{internal_err, HashMap, ScalarValue};
+use half::f16;
+use std::fmt::Debug;
+use std::hash::Hash;
+
+#[derive(Clone)]
+pub(super) struct PrimitiveArrayMapHolder<T>

Review Comment:
   Considering calling this `PrimitiveIndexMap` to be consistent with the other 
implementations



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,

Review Comment:
   A general comment on the code is that I think it's easier to follow if 
variables and fields are named after what they represent rather than how they 
are used. This array is the set of then (and else) values. Would be useful if 
the name reflects that.



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(

Review Comment:
   `output_array` in the context of this module did not have much meaning to me 
when reading the code.



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(
+        &self,
+        expr_array: &ArrayRef,
+    ) -> datafusion_common::Result<ArrayRef> {
+        let take_indices = self.lookup.match_values(expr_array)?;
+
+        // Zero-copy conversion
+        let take_indices = Int32Array::from(take_indices);
+
+        // An optimize version would depend on the type of the 
values_to_take_from
+        // For example, if the type is view we can just keep pointing to the 
same value (similar to dictionary)
+        // if the type is dictionary we can just use the indices as is (or 
cast them to the key type) and create a new dictionary array
+        let output = arrow::compute::take(&self.values_to_take_from, 
&take_indices, None)
+            .map_err(|e| arrow_datafusion_err!(e))?;
+
+        Ok(output)
+    }
+}
+
+/// Lookup table for mapping literal values to their corresponding indices in 
the THEN clauses
+///
+/// The else index is used when a value is not found in the lookup table
+pub(super) trait WhenLiteralIndexMap: Debug + Send + Sync {
+    /// Try creating a new lookup table from the given literals and else index
+    ///
+    /// `literals` are guaranteed to be unique and non-nullable
+    fn try_new(
+        unique_non_null_literals: Vec<ScalarValue>,
+        else_index: i32,
+    ) -> datafusion_common::Result<Self>
+    where
+        Self: Sized;
+
+    /// Return indices to take from the literals based on the values in the 
given array
+    fn match_values(&self, array: &ArrayRef) -> 
datafusion_common::Result<Vec<i32>>;
+}
+
+pub(crate) fn try_creating_lookup_table(
+    unique_non_null_literals: Vec<ScalarValue>,
+    else_index: i32,
+) -> datafusion_common::Result<Arc<dyn WhenLiteralIndexMap>> {

Review Comment:
   q: is `Arc` useful here or can this just be `Box`?



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(
+        &self,
+        expr_array: &ArrayRef,
+    ) -> datafusion_common::Result<ArrayRef> {
+        let take_indices = self.lookup.match_values(expr_array)?;
+
+        // Zero-copy conversion
+        let take_indices = Int32Array::from(take_indices);
+
+        // An optimize version would depend on the type of the 
values_to_take_from
+        // For example, if the type is view we can just keep pointing to the 
same value (similar to dictionary)
+        // if the type is dictionary we can just use the indices as is (or 
cast them to the key type) and create a new dictionary array
+        let output = arrow::compute::take(&self.values_to_take_from, 
&take_indices, None)
+            .map_err(|e| arrow_datafusion_err!(e))?;
+
+        Ok(output)
+    }
+}
+
+/// Lookup table for mapping literal values to their corresponding indices in 
the THEN clauses
+///
+/// The else index is used when a value is not found in the lookup table
+pub(super) trait WhenLiteralIndexMap: Debug + Send + Sync {
+    /// Try creating a new lookup table from the given literals and else index
+    ///
+    /// `literals` are guaranteed to be unique and non-nullable
+    fn try_new(
+        unique_non_null_literals: Vec<ScalarValue>,
+        else_index: i32,
+    ) -> datafusion_common::Result<Self>
+    where
+        Self: Sized;
+
+    /// Return indices to take from the literals based on the values in the 
given array
+    fn match_values(&self, array: &ArrayRef) -> 
datafusion_common::Result<Vec<i32>>;

Review Comment:
   Since the returned values are base zero indices into an array, wouldn't 
`u32` be a more appropriate choice as return value? What's the expected 
behaviour when a negative value is returned?



##########
datafusion/physical-expr/src/expressions/case/literal_lookup_table/mod.rs:
##########
@@ -0,0 +1,440 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+mod boolean_lookup_table;
+mod bytes_like_lookup_table;
+mod primitive_lookup_table;
+
+use 
crate::expressions::case::literal_lookup_table::boolean_lookup_table::BooleanIndexMap;
+use crate::expressions::case::literal_lookup_table::bytes_like_lookup_table::{
+    BytesDictionaryHelper, BytesLikeIndexMap, BytesViewDictionaryHelper,
+    FixedBinaryHelper, FixedBytesDictionaryHelper, GenericBytesHelper,
+    GenericBytesViewHelper,
+};
+use 
crate::expressions::case::literal_lookup_table::primitive_lookup_table::PrimitiveArrayMapHolder;
+use crate::expressions::case::WhenThen;
+use crate::expressions::Literal;
+use arrow::array::{downcast_integer, downcast_primitive, ArrayRef, Int32Array};
+use arrow::datatypes::{
+    ArrowDictionaryKeyType, BinaryViewType, DataType, GenericBinaryType,
+    GenericStringType, StringViewType,
+};
+use datafusion_common::DataFusionError;
+use datafusion_common::{arrow_datafusion_err, plan_datafusion_err, 
ScalarValue};
+use datafusion_physical_expr_common::physical_expr::PhysicalExpr;
+use indexmap::IndexMap;
+use std::fmt::Debug;
+use std::sync::Arc;
+
+/// Optimization for CASE expressions with literal WHEN and THEN clauses
+///
+/// for this form:
+/// ```sql
+/// CASE <expr_a>
+///     WHEN <literal_a> THEN <literal_e>
+///     WHEN <literal_b> THEN <literal_f>
+///     WHEN <literal_c> THEN <literal_g>
+///     WHEN <literal_d> THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+/// # Improvement idea
+/// TODO - we should think of unwrapping the `IN` expressions into multiple 
equality comparisons
+/// so it will use this optimization as well, e.g.
+/// ```sql
+/// -- Before
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> in (<literal_b>, <literal_c>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_g>
+/// ELSE <optional-fallback_literal>
+///
+/// -- After
+/// CASE
+///     WHEN (<expr_a> = <literal_a>) THEN <literal_e>
+///     WHEN (<expr_a> = <literal_b>) THEN <literal_f>
+///     WHEN (<expr_a> = <literal_c>) THEN <literal_g>
+///     WHEN (<expr_a> = <literal_d>) THEN <literal_h>
+///     ELSE <optional-fallback_literal>
+/// END
+/// ```
+///
+#[derive(Debug)]
+pub(in super::super) struct LiteralLookupTable {
+    /// The lookup table to use for evaluating the CASE expression
+    lookup: Arc<dyn WhenLiteralIndexMap>,
+
+    /// [`ArrayRef`] where `array[i] = then_literals[i]`
+    /// the last value in the array is the else_expr
+    values_to_take_from: ArrayRef,
+}
+
+impl LiteralLookupTable {
+    pub(in super::super) fn maybe_new(
+        when_then_expr: &Vec<WhenThen>,
+        else_expr: &Option<Arc<dyn PhysicalExpr>>,
+    ) -> Option<Self> {
+        // We can't use the optimization if we don't have any when then pairs
+        if when_then_expr.is_empty() {
+            return None;
+        }
+
+        // If we only have 1 than this optimization is not useful
+        if when_then_expr.len() == 1 {
+            return None;
+        }
+
+        // Try to downcast all the WHEN/THEN expressions to literals
+        let when_then_exprs_maybe_literals = when_then_expr
+            .iter()
+            .map(|(when, then)| {
+                let when_maybe_literal = 
when.as_any().downcast_ref::<Literal>();
+                let then_maybe_literal = 
then.as_any().downcast_ref::<Literal>();
+
+                when_maybe_literal.zip(then_maybe_literal)
+            })
+            .collect::<Vec<_>>();
+
+        // If not all the WHEN/THEN expressions are literals we cannot use 
this optimization
+        if when_then_exprs_maybe_literals.contains(&None) {
+            return None;
+        }
+
+        let when_then_exprs_scalars = when_then_exprs_maybe_literals
+            .into_iter()
+            // Unwrap the options as we have already checked there is no None
+            .flatten()
+            .map(|(when_lit, then_lit)| {
+                (when_lit.value().clone(), then_lit.value().clone())
+            })
+            // Only keep non-null WHEN literals
+            // as they cannot be matched - case NULL WHEN NULL THEN ... ELSE 
... END always goes to ELSE
+            .filter(|(when_lit, _)| !when_lit.is_null())
+            .collect::<Vec<_>>();
+
+        if when_then_exprs_scalars.is_empty() {
+            // All WHEN literals were nulls, so cannot use optimization
+            //
+            // instead, another optimization would be to go straight to the 
ELSE clause
+            return None;
+        }
+
+        // Keep only the first occurrence of each when literal (as the first 
match is used)
+        // and remove nulls (as they cannot be matched - case NULL WHEN NULL 
THEN ... ELSE ... END always goes to ELSE)
+        let (when_literals, then_literals): (Vec<ScalarValue>, 
Vec<ScalarValue>) = {
+            let mut map = IndexMap::with_capacity(when_then_expr.len());
+
+            for (when, then) in when_then_exprs_scalars.into_iter() {
+                // Don't overwrite existing entries as we want to keep the 
first occurrence
+                if !map.contains_key(&when) {
+                    map.insert(when, then);
+                }
+            }
+
+            map.into_iter().unzip()
+        };
+
+        let else_expr: ScalarValue = if let Some(else_expr) = else_expr {
+            let literal = else_expr.as_any().downcast_ref::<Literal>()?;
+
+            literal.value().clone()
+        } else {
+            let Ok(null_scalar) =
+                ScalarValue::try_new_null(&then_literals[0].data_type())
+            else {
+                return None;
+            };
+
+            null_scalar
+        };
+
+        {
+            let data_type = when_literals[0].data_type();
+
+            // If not all the WHEN literals are the same data type we cannot 
use this optimization
+            if when_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+        }
+
+        {
+            let data_type = then_literals[0].data_type();
+
+            // If not all the then and the else literals are the same data 
type we cannot use this optimization
+            if then_literals.iter().any(|l| l.data_type() != data_type) {
+                return None;
+            }
+
+            if else_expr.data_type() != data_type {
+                return None;
+            }
+        }
+
+        let output_array = ScalarValue::iter_to_array(
+            then_literals
+                .iter()
+                // The else is in the end
+                .chain(std::iter::once(&else_expr))
+                .cloned(),
+        )
+        .ok()?;
+
+        let lookup = try_creating_lookup_table(
+            when_literals,
+            // The else expression is in the end
+            output_array.len() as i32 - 1,
+        )
+        .ok()?;
+
+        Some(Self {
+            lookup,
+            values_to_take_from: output_array,
+        })
+    }
+
+    pub(in super::super) fn create_output(
+        &self,
+        expr_array: &ArrayRef,
+    ) -> datafusion_common::Result<ArrayRef> {
+        let take_indices = self.lookup.match_values(expr_array)?;
+
+        // Zero-copy conversion
+        let take_indices = Int32Array::from(take_indices);
+
+        // An optimize version would depend on the type of the 
values_to_take_from
+        // For example, if the type is view we can just keep pointing to the 
same value (similar to dictionary)
+        // if the type is dictionary we can just use the indices as is (or 
cast them to the key type) and create a new dictionary array
+        let output = arrow::compute::take(&self.values_to_take_from, 
&take_indices, None)
+            .map_err(|e| arrow_datafusion_err!(e))?;
+
+        Ok(output)
+    }
+}
+
+/// Lookup table for mapping literal values to their corresponding indices in 
the THEN clauses
+///
+/// The else index is used when a value is not found in the lookup table
+pub(super) trait WhenLiteralIndexMap: Debug + Send + Sync {
+    /// Try creating a new lookup table from the given literals and else index
+    ///
+    /// `literals` are guaranteed to be unique and non-nullable
+    fn try_new(

Review Comment:
   Is it useful to define this function on the trait? It seems to restrict 
creation options for implementations. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to