LiaCastaneda commented on code in PR #21807:
URL: https://github.com/apache/datafusion/pull/21807#discussion_r3167111551


##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -76,21 +77,51 @@ pub struct DynamicFilterPhysicalExpr {
     nullable: Arc<RwLock<Option<bool>>>,
 }
 
-#[derive(Debug)]
-struct Inner {
+/// Atomic internal state of a [`DynamicFilterPhysicalExpr`].
+///
+/// `expression_id` lives here because it identifies the actual filter 
expression `expr`.
+/// Derived `DynamicFilterPhysicalExpr`s (e.g. via 
[`PhysicalExpr::with_new_children`]) are
+/// the same logical filter and must report the same `expression_id`.
+///
+/// **Warning:** exposed publicly solely so that proto (de)serialization in
+/// `datafusion-proto` can read and rebuild this state. Do not treat this type
+/// or its layout as a stable API.
+#[derive(Clone)]
+pub struct Inner {
+    /// A unique identifier for the expression.
+    pub expression_id: u64,
     /// A counter that gets incremented every time the expression is updated 
so that we can track changes cheaply.
     /// This is used for [`PhysicalExpr::snapshot_generation`] to have a cheap 
check for changes.
-    generation: u64,
-    expr: Arc<dyn PhysicalExpr>,
+    pub generation: u64,
+    pub expr: Arc<dyn PhysicalExpr>,
     /// Flag for quick synchronous check if filter is complete.
     /// This is redundant with the watch channel state, but allows us to 
return immediately
     /// from `wait_complete()` without subscribing if already complete.
-    is_complete: bool,
+    pub is_complete: bool,
+}
+
+// TODO: Include expression_id in Debug output.
+//
+// See https://github.com/apache/datafusion/issues/20418. Currently, plan nodes
+// like `HashJoinExec`, `AggregateExec`, `SortExec` do not serialize their
+// dynamic filter. They auto-create one on decode with a fresh `expression_id`,
+// so a round-trip Debug comparison would diverge purely on the id even when
+// the rest of the state is preserved. Excluding it from Debug keeps those
+// roundtrip equality assertions meaningful until that work lands.
+impl std::fmt::Debug for Inner {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("Inner")
+            .field("generation", &self.generation)
+            .field("expr", &self.expr)
+            .field("is_complete", &self.is_complete)
+            .finish()
+    }
 }
 
 impl Inner {
     fn new(expr: Arc<dyn PhysicalExpr>) -> Self {
         Self {
+            expression_id: random::<u64>(),

Review Comment:
   Would it be better to use something more standard like uuid or an atomic 
counter across all expressions?



##########
datafusion/proto/proto/datafusion.proto:
##########
@@ -889,11 +889,9 @@ message PhysicalExprNode {
   reserved 17;
 
   // Unique identifier for this expression to do deduplication during 
deserialization.
-  // When serializing, this is set to a unique identifier for each combination 
of
-  // expression, process and serialization run.
-  // When deserializing, if this ID has been seen before, the cached Arc is 
returned
-  // instead of creating a new one, enabling reconstruction of referential 
integrity
-  // across serde roundtrips.
+  // When serializing, this is set via `PhysicalExpr::expression_id`. When 
deserializing,
+  // this id is used by the `DeduplicatingProtoConverter` to preserve 
referential
+  // integrity across serde roundtrips for different expressions with the same 
id.
   optional uint64 expr_id = 30;

Review Comment:
   I didn't remember we had this already  🤔 we had an id in the proto but not 
in the trait, I guess this suggests the original intent was always to have 
expressions carry their own identity in the trait as well



##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########
@@ -76,21 +77,51 @@ pub struct DynamicFilterPhysicalExpr {
     nullable: Arc<RwLock<Option<bool>>>,
 }
 
-#[derive(Debug)]
-struct Inner {
+/// Atomic internal state of a [`DynamicFilterPhysicalExpr`].
+///
+/// `expression_id` lives here because it identifies the actual filter 
expression `expr`.
+/// Derived `DynamicFilterPhysicalExpr`s (e.g. via 
[`PhysicalExpr::with_new_children`]) are
+/// the same logical filter and must report the same `expression_id`.
+///
+/// **Warning:** exposed publicly solely so that proto (de)serialization in
+/// `datafusion-proto` can read and rebuild this state. Do not treat this type
+/// or its layout as a stable API.
+#[derive(Clone)]
+pub struct Inner {

Review Comment:
   Should we hide this somehow instead of using a warning comment? Exposing 
this leaves the door open to callers messing with the inner dynamic filter's 
`Arc` count -- which isn't great either. It's strange to me that Rust doesn't 
provide a way to do this currently.



##########
datafusion/physical-expr/src/expressions/dynamic_filters.rs:
##########


Review Comment:
   would it make sense to remove `snapshot()` api now that it's not used? 
(maybe as a follow up) i think the only expression that implements it is the 
dynamic filter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to