jorgecarleitao commented on a change in pull request #8720:
URL: https://github.com/apache/arrow/pull/8720#discussion_r528227594



##########
File path: rust/datafusion/src/logical_plan/plan.rs
##########
@@ -94,6 +101,21 @@ pub enum LogicalPlan {
         /// The incoming logical plan
         input: Arc<LogicalPlan>,
     },
+    /// Join two logical plans on one or more join columns
+    Join {
+        /// Left input
+        left: Arc<LogicalPlan>,
+        /// Right input
+        right: Arc<LogicalPlan>,
+        /// Columns in left input to use for join keys
+        left_keys: Vec<String>,
+        /// Columns in right input to use for join keys
+        right_keys: Vec<String>,

Review comment:
       I would use `Vec<(left_i, right_i)>` because it automatically enforces 
the invariant that `left_keys.len() == right_keys.len()`. We can still keep the 
public interface `left_keys, right_keys` and perform the check before passing 
them to the builder.
   
   Atm, when we use `left.zip(right)`, we take the shortest vector, which may 
hide a bug in the code.

##########
File path: rust/datafusion/src/physical_plan/hash_utils.rs
##########
@@ -29,7 +29,7 @@ pub enum JoinType {
 }
 
 /// The on clause of the join, as vector of (left, right) columns.
-pub type JoinOn<'a> = [(&'a str, &'a str)];
+pub type JoinOn = [(String, String)];

Review comment:
       This is too small compared to the others ops to justify the effort at 
this point, IMO.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to