eejbyfeldt commented on code in PR #13252:
URL: https://github.com/apache/datafusion/pull/13252#discussion_r1828317857


##########
datafusion/optimizer/src/optimize_projections/mod.rs:
##########
@@ -682,7 +682,8 @@ fn split_join_requirements(
         | JoinType::Left
         | JoinType::Right
         | JoinType::Full
-        | JoinType::LeftMark => {
+        | JoinType::LeftMark 
+        | JoinType::RightMark => {
             // Decrease right side indices by `left_len` so that they point to 
valid
             // positions within the right child:
             indices.split_off(left_len)

Review Comment:
   Is this really correct? Feels like we will assign the wrong indices to the 
left here for `RightMark`? Should `RightMark` be moved to `RightSemi` and 
`RightAnti`. Maybe we should for `LeftMark` as well for clarity (and maybe that 
also fixes some edge case?)



##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -726,7 +726,7 @@ fn to_substrait_jointype(join_type: JoinType) -> 
join_rel::JoinType {
         JoinType::LeftAnti => join_rel::JoinType::LeftAnti,
         JoinType::LeftSemi => join_rel::JoinType::LeftSemi,
         JoinType::LeftMark => join_rel::JoinType::LeftMark,
-        JoinType::RightAnti | JoinType::RightSemi => {
+        JoinType::RightAnti | JoinType::RightSemi | JoinType::RightMark => {

Review Comment:
   Why can we not turn it into `join_rel::JoinType::RightMark` here?



##########
datafusion/proto-common/src/generated/pbjson.rs:
##########
@@ -3911,6 +3913,7 @@ impl<'de> serde::Deserialize<'de> for JoinType {
                     "RIGHTSEMI" => Ok(JoinType::Rightsemi),
                     "RIGHTANTI" => Ok(JoinType::Rightanti),
                     "LEFTMARK" => Ok(JoinType::Leftmark),
+                    "RIGHTMARK" => Ok(JoinTYpe::Rightmark),

Review Comment:
   This is causing CI failures. I belive this file should be generated by 
running: 
https://github.com/apache/datafusion/blob/main/datafusion/proto-common/regen.sh
   ```suggestion
                       "RIGHTMARK" => Ok(JoinType::Rightmark),
   ```



##########
datafusion/physical-plan/src/joins/symmetric_hash_join.rs:
##########
@@ -731,6 +731,22 @@ where
                 .collect();
             (build_indices, probe_indices)
         }
+        (JoinSide::Right, JoinType::RightMark) => {
+            let probe_indices = (0..prune_length)
+                .map(R::Native::from_usize)
+                .collect::<PrimitiveArray<R>>();
+            let build_indices = (0..prune_length)
+                .map(|idx| {
+                    // For mark join we output a dummy index 0 to indicate the 
row had a match
+                    if visited_rows.contains(&(idx + deleted_offset)) {
+                        Some(L::Native::from_usize(0).unwrap())
+                    } else {
+                        None
+                    }
+                })
+                .collect();
+            (build_indices, probe_indices)

Review Comment:
   This looks very similar to what we do or the `LeftMark` could we move this 
code into a function and call it for both?



##########
datafusion/sql/src/unparser/plan.rs:
##########
@@ -866,7 +866,7 @@ impl Unparser<'_> {
             JoinType::LeftSemi => ast::JoinOperator::LeftSemi(constraint),
             JoinType::RightAnti => ast::JoinOperator::RightAnti(constraint),
             JoinType::RightSemi => ast::JoinOperator::RightSemi(constraint),
-            JoinType::LeftMark => unimplemented!("Unparsing of Left Mark join 
type"),
+            JoinType::LeftMark | JoinType::RightMark => 
unimplemented!("Unparsing of Left Mark join type"),

Review Comment:
   ```suggestion
               JoinType::LeftMark | JoinType::RightMark => 
unimplemented!("Unparsing of mark join type"),
   ```



##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -689,6 +690,16 @@ pub fn build_join_schema(
             left_fields().chain(right_field).unzip()
         }
         JoinType::RightSemi | JoinType::RightAnti => right_fields().unzip(),
+        JoinType::RightMark => {
+            let left_field = once((
+                Field::new("mark", arrow_schema::DataType::Boolean, false),
+                ColumnIndex {
+                    index: 0, // 'mark' is not associated with either side
+                    side: JoinSide::None,
+                },
+            ));

Review Comment:
   Maybe this could also be moved to a function that is used for both 
`LeftMark` and `RightMark`.



##########
datafusion/core/src/physical_optimizer/join_selection.rs:
##########
@@ -136,6 +136,9 @@ fn swap_join_type(join_type: JoinType) -> JoinType {
         JoinType::LeftMark => {
             unreachable!("LeftMark join type does not support swapping")
         }
+        JoinType::RightMark => {
+            unreachable!("RightMark join type does not support swapping")

Review Comment:
   Yeah, supporting swap is partly why we are adding RightMark. So would be 
good to have that fixed in this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to