dispanser opened a new issue, #20011:
URL: https://github.com/apache/datafusion/issues/20011
### Describe the bug
We're currently upgrading from datafusion 45 to datafusion 46 :see-no-evil:
and came across an issue related to dictionary ids during protobuf serde.
Here's a small reproducer, and this seems to still be a problem on current
datafusion:
The following test works on df-45 but fails on df-46 and all later releases,
added to the bottom of
`datafusion/proto/tests/cases/roundtrip_physical_plan.rs`:
```rust
#[test]
fn roundtrip_call_null_scalar_struct_dict() {
let data_type = DataType::Struct(Fields::from(vec![Field::new(
"item",
DataType::Dictionary(Box::new(DataType::UInt32),
Box::new(DataType::Utf8)),
true,
)]));
let schema = Arc::new(Schema::new(Fields::from([Arc::new(Field::new(
"a",
data_type.clone(),
true,
))])));
let scan = Arc::new(EmptyExec::new(schema.clone()));
let scalar = lit(ScalarValue::try_from(data_type).unwrap());
let filter = Arc::new(
FilterExec::try_new(
Arc::new(BinaryExpr::new(
scalar,
datafusion::logical_expr::Operator::Eq,
col("a", &schema).unwrap(),
)),
scan,
)
.unwrap(),
);
roundtrip_test(filter).expect("roundtrip");
}
```
In df46, this fails during deserialization of protobuf back to the physical
plan:
```
thread
'cases::roundtrip_physical_plan::roundtrip_call_null_scalar_struct_dict'
panicked at datafusion/proto/tests/cases/roundtrip_physical_plan.rs:140:10:
from proto: Plan("DataFusion error: ArrowError(SchemaError(\"Invalid data
for schema. Field { name: \\\"item\\\", data_type: Dictionary(UInt32, Utf8),
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} } refers to
node not found in schema\"), None)")
```
In df52, it already fails during serialization, as there's no pre-allocated
dict id available from the dictionary tracker
```
thread
'cases::roundtrip_physical_plan::roundtrip_call_null_scalar_struct_dict'
(145817) panicked at
datafusion/proto/tests/cases/roundtrip_physical_plan.rs:148:14:
to proto: Plan("General error: Error encoding ScalarValue::List as IPC: Ipc
error: no dict id for field item")
```
I am a bit puzzled on how that is supposed to work. From what I can tell,
the `DictionaryTracker` seems to be initialized empty and then never modified -
I don't see a possible code path where a dict_id is even created.
### To Reproduce
_No response_
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]