alamb commented on code in PR #13223:
URL: https://github.com/apache/datafusion/pull/13223#discussion_r1828261178


##########
datafusion/physical-plan/src/joins/cross_join.rs:
##########
@@ -46,16 +46,23 @@ use 
datafusion_physical_expr::equivalence::join_equivalence_properties;
 use async_trait::async_trait;
 use futures::{ready, Stream, StreamExt, TryStreamExt};
 
-/// Data of the left side
+/// Data of the left side that is buffered into memory
 type JoinLeftData = (RecordBatch, MemoryReservation);
 
-#[allow(rustdoc::private_intra_doc_links)]
-/// executes partitions in parallel and combines them into a set of
-/// partitions by combining all values from the left with all values on the 
right
+/// Cross Join Execution Plan
 ///
-/// Note that the `Clone` trait is not implemented for this struct due to the
-/// `left_fut` [`OnceAsync`], which is used to coordinate the loading of the
-/// left side with the processing in each output stream.
+/// This operator is used when there are no predicates between two tables and
+/// returns the Cartesian product of the two tables.
+///
+/// Buffers the left input into memory and then streams batches from each
+/// partition on the right input combining them with the buffered left input
+/// to generate the output.
+///
+/// # Clone / Shared State
+///
+/// Note this structure includes a [`OnceAsync`] that is used to coordinate the
+/// loading of the left side with the processing in each output stream.
+/// Therefore it can not be [`Clone`]

Review Comment:
   I think calling `ExecutionPlan::execute` results in a `Stream` 
   
   In the join cases, the data would be shared across the clone, I think
   
   For example, 
   
   The first call to `HashJoinExec::execute`  (perhaps non obviously) updates 
the contents of `self.left_fut`  
   
   
https://github.com/apache/datafusion/blob/a32802c69cfb74ff5750700a8b72db87a44c55f0/datafusion/physical-plan/src/joins/hash_join.rs#L713-L712
   
   And then subsequent invocations of HashJoinExec::exec read the (same) 
`self.left_fut` which is shared across streams
   
   



##########
datafusion/physical-plan/src/joins/cross_join.rs:
##########
@@ -46,16 +46,23 @@ use 
datafusion_physical_expr::equivalence::join_equivalence_properties;
 use async_trait::async_trait;
 use futures::{ready, Stream, StreamExt, TryStreamExt};
 
-/// Data of the left side
+/// Data of the left side that is buffered into memory
 type JoinLeftData = (RecordBatch, MemoryReservation);
 
-#[allow(rustdoc::private_intra_doc_links)]
-/// executes partitions in parallel and combines them into a set of
-/// partitions by combining all values from the left with all values on the 
right
+/// Cross Join Execution Plan
 ///
-/// Note that the `Clone` trait is not implemented for this struct due to the
-/// `left_fut` [`OnceAsync`], which is used to coordinate the loading of the
-/// left side with the processing in each output stream.
+/// This operator is used when there are no predicates between two tables and
+/// returns the Cartesian product of the two tables.
+///
+/// Buffers the left input into memory and then streams batches from each
+/// partition on the right input combining them with the buffered left input
+/// to generate the output.
+///
+/// # Clone / Shared State
+///
+/// Note this structure includes a [`OnceAsync`] that is used to coordinate the
+/// loading of the left side with the processing in each output stream.
+/// Therefore it can not be [`Clone`]

Review Comment:
   I think calling `ExecutionPlan::execute` results in a `Stream` 
   
   In the join cases, the data would be shared across the clone, I think
   
   For example, 
   
   The first call to `HashJoinExec::execute`  (perhaps non obviously) updates 
the contents of `self.left_fut`  
   
   
https://github.com/apache/datafusion/blob/a32802c69cfb74ff5750700a8b72db87a44c55f0/datafusion/physical-plan/src/joins/hash_join.rs#L712-L713
   
   And then subsequent invocations of HashJoinExec::exec read the (same) 
`self.left_fut` which is shared across streams
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to