advancedxy commented on code in PR #10031:
URL:
https://github.com/apache/arrow-datafusion/pull/10031#discussion_r1564026621
##########
datafusion/physical-expr/src/expressions/in_list.rs:
##########
@@ -1314,4 +1340,96 @@ mod tests {
Ok(())
}
+
+ #[test]
+ fn in_list_utf8_with_dict_types() -> Result<()> {
+ fn dict_lit(key_type: DataType, value: &str) -> Arc<dyn PhysicalExpr> {
+ lit(ScalarValue::Dictionary(
+ Box::new(key_type),
+ Box::new(ScalarValue::new_utf8(value.to_string())),
+ ))
+ }
+
+ fn null_dict_lit(key_type: DataType) -> Arc<dyn PhysicalExpr> {
+ lit(ScalarValue::Dictionary(
+ Box::new(key_type),
+ Box::new(ScalarValue::Utf8(None)),
+ ))
+ }
+
+ let schema = Schema::new(vec![Field::new(
+ "a",
+ DataType::Dictionary(Box::new(DataType::UInt16),
Box::new(DataType::Utf8)),
+ true,
+ )]);
+ let a: UInt16DictionaryArray =
+ vec![Some("a"), Some("d"), None].into_iter().collect();
+ let col_a = col("a", &schema)?;
+ let batch = RecordBatch::try_new(Arc::new(schema.clone()),
vec![Arc::new(a)])?;
+
+ // expression: "a in ("a", "b")"
+ let lists = [
+ vec![lit("a"), lit("b")],
+ vec![
+ dict_lit(DataType::Int8, "a"),
+ dict_lit(DataType::UInt16, "b"),
+ ],
+ ];
+ for list in lists.iter() {
+ in_list_raw!(
Review Comment:
> But there is no way that in_list_raw will be called outside of this module
The function `in_list` is public, so it's possible that `in_list` is called
directly without any type coercion, which is exactly what Comet tries to do in
the first place to leverage the static filter optimization. Since there's no
type coercion, it's possible the value type and the list type are mixed with
dictionary types. I think this test simulates exactly the issue I encountered
in the Comet's case.
For the sqllogictest part, jayzhan11 has already pointed out in
https://github.com/apache/arrow-datafusion/issues/9530#issuecomment-2008474510,
the type coercion is happened in the optimization phase.
> In datafusion, we have done it in the optimization step, when we reach
in_list here, we can ensure the types are consistent, so we just go ahead
without type checking.
I think that's why the E2E tests already pass without this PR's fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]