advancedxy commented on code in PR #10031:
URL: 
https://github.com/apache/arrow-datafusion/pull/10031#discussion_r1564026621


##########
datafusion/physical-expr/src/expressions/in_list.rs:
##########
@@ -1314,4 +1340,96 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn in_list_utf8_with_dict_types() -> Result<()> {
+        fn dict_lit(key_type: DataType, value: &str) -> Arc<dyn PhysicalExpr> {
+            lit(ScalarValue::Dictionary(
+                Box::new(key_type),
+                Box::new(ScalarValue::new_utf8(value.to_string())),
+            ))
+        }
+
+        fn null_dict_lit(key_type: DataType) -> Arc<dyn PhysicalExpr> {
+            lit(ScalarValue::Dictionary(
+                Box::new(key_type),
+                Box::new(ScalarValue::Utf8(None)),
+            ))
+        }
+
+        let schema = Schema::new(vec![Field::new(
+            "a",
+            DataType::Dictionary(Box::new(DataType::UInt16), 
Box::new(DataType::Utf8)),
+            true,
+        )]);
+        let a: UInt16DictionaryArray =
+            vec![Some("a"), Some("d"), None].into_iter().collect();
+        let col_a = col("a", &schema)?;
+        let batch = RecordBatch::try_new(Arc::new(schema.clone()), 
vec![Arc::new(a)])?;
+
+        // expression: "a in ("a", "b")"
+        let lists = [
+            vec![lit("a"), lit("b")],
+            vec![
+                dict_lit(DataType::Int8, "a"),
+                dict_lit(DataType::UInt16, "b"),
+            ],
+        ];
+        for list in lists.iter() {
+            in_list_raw!(

Review Comment:
   > But there is no way that in_list_raw will be called outside of this module
   
   The function `in_list` is public, so it's possible that `in_list` is called 
directly without any type coercion, which is exactly what Comet tries to do in 
the first place to leverage the static filter optimization. Since there's no 
type coercion, it's possible the value type and the list type are mixed with 
dictionary types. I think this test simulates exactly the issue I encountered 
in the Comet's case. 
   
   For the sqllogictest part, jayzhan11 has already pointed out in 
https://github.com/apache/arrow-datafusion/issues/9530#issuecomment-2008474510, 
the type coercion is happened in the optimization phase.
   >  In datafusion, we have done it in the optimization step, when we reach 
in_list here, we can ensure the types are consistent, so we just go ahead 
without type checking.
   
   I think that's why the E2E tests already pass without this PR's fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to