Re: [PR] fix: regression of `dict_id` in physical plan proto [datafusion]

via GitHub Tue, 03 Feb 2026 13:30:20 -0800


alamb commented on code in PR #20063:
URL: https://github.com/apache/datafusion/pull/20063#discussion_r2760898107



##########
datafusion/proto-common/src/from_proto/mod.rs:
##########
@@ -401,55 +406,83 @@ impl TryFrom<&protobuf::ScalarValue> for ScalarValue {
                     schema_ref.try_into()?
                 } else {
                     return Err(Error::General(
-                        "Invalid schema while deserializing ScalarValue::List"
+                        "Invalid schema while deserializing nested ScalarValue"
                             .to_string(),
                     ));
                 };
 
+                // IPC dictionary batch IDs are assigned when encoding the 
schema, but our protobuf

Review Comment:
   I feel like somehow I missing something key -- this seems like a pretty 
massive  overhead to create / decode a single value here. 
   
   That being said, it seems like the existing code is also doing the overhead, 
so maybe it is fine for now 🤔 
   
   I wonder if we could pull this logic for creating the schema into its own 
function to try and reduce the size of the overall method / make it easier to 
understand



##########
datafusion/proto/tests/cases/roundtrip_physical_plan.rs:
##########
@@ -2564,3 +2564,22 @@ fn custom_proto_converter_intercepts() -> Result<()> {
 
     Ok(())
 }
+
+#[test]

Review Comment:
   With the code change reverted, this test fails like
   
   
   ---- cases::roundtrip_physical_plan::roundtrip_call_null_scalar_struct_dict 
stdout ----
   Error: Plan("General error: Error encoding ScalarValue::List as IPC: Ipc 
error: no dict id for field item")
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: regression of `dict_id` in physical plan proto [datafusion]

Reply via email to