kosiew commented on code in PR #22451:
URL: https://github.com/apache/datafusion/pull/22451#discussion_r3292548937


##########
datafusion/physical-plan/src/joins/hash_join/partitioned_hash_eval.rs:
##########
@@ -289,6 +289,38 @@ impl PartialEq for HashTableLookupExpr {
 
 impl Eq for HashTableLookupExpr {}
 
+impl HashTableLookupExpr {
+    /// Serialize this expression to protobuf.
+    ///
+    /// `HashTableLookupExpr` holds an `Arc<Map>` (a runtime hash table built
+    /// on the build side) which cannot be serialized. We replace it with
+    /// `lit(true)`, which is safe because:
+    ///
+    /// - The filter is a performance optimisation, not a correctness 
requirement.
+    /// - `lit(true)` passes all rows so no valid rows are lost.
+    /// - In distributed execution the remote worker has no access to the
+    ///   build-side hash table anyway.
+    pub fn try_to_proto(

Review Comment:
   I am also seeing a separate build issue here. This new signature references 
`physical_expr::proto_encode`, `datafusion_proto_common`, and 
`datafusion_proto_models` unconditionally, but `datafusion-physical-plan` does 
not expose those dependencies in a normal build.
   
   `cargo check -p datafusion-physical-plan` currently fails with unresolved 
imports for `proto_encode`, `datafusion_proto_common`, and 
`datafusion_proto_models`.
   
   Could you gate this consistently with the existing 
`PhysicalExpr::try_to_proto` API and wire the dependencies through the 
appropriate `proto` feature?



##########
datafusion/physical-plan/src/joins/hash_join/partitioned_hash_eval.rs:
##########
@@ -289,6 +289,38 @@ impl PartialEq for HashTableLookupExpr {
 
 impl Eq for HashTableLookupExpr {}
 
+impl HashTableLookupExpr {
+    /// Serialize this expression to protobuf.
+    ///
+    /// `HashTableLookupExpr` holds an `Arc<Map>` (a runtime hash table built
+    /// on the build side) which cannot be serialized. We replace it with
+    /// `lit(true)`, which is safe because:
+    ///
+    /// - The filter is a performance optimisation, not a correctness 
requirement.
+    /// - `lit(true)` passes all rows so no valid rows are lost.
+    /// - In distributed execution the remote worker has no access to the
+    ///   build-side hash table anyway.
+    pub fn try_to_proto(

Review Comment:
   Building on Jay's comment, I think this needs to live in `impl PhysicalExpr 
for HashTableLookupExpr` rather than as an inherent 
`HashTableLookupExpr::try_to_proto` method.
   
   The serializer calls `expr.try_to_proto(&ctx)` through `&dyn PhysicalExpr`, 
so this implementation is not reached. Serialization still depends on the 
centralized downcast fallback in 
`datafusion/proto/src/physical_plan/to_proto.rs`.
   
   Could you move this into the `PhysicalExpr` impl with the same 
`#[cfg(feature = "proto")]` signature? That should make the expression-local 
serialization path actually take effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to