alamb commented on code in PR #8538:
URL: https://github.com/apache/arrow-datafusion/pull/8538#discussion_r1428265068


##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -789,6 +788,98 @@ where
     Ok(())
 }
 
+/// Represents build-side of hash join.
+enum BuildSide {
+    /// Indicates that build-side not collected yet
+    Initial(BuildSideInitialState),
+    /// Indicates that build-side data has been collected
+    Ready(BuildSideReadyState),
+}
+
+/// Container for BuildSide::Initial related data
+struct BuildSideInitialState {
+    /// Future for building hash table from build-side input
+    left_fut: OnceFut<JoinLeftData>,
+}
+
+/// Container for BuildSide::Ready related data
+struct BuildSideReadyState {
+    /// Collected build-side data
+    left_data: Arc<JoinLeftData>,
+    /// Which build-side rows have been matched while creating output.
+    /// For some OUTER joins, we need to know which rows have not been matched
+    /// to produce the correct output.
+    visited_left_side: BooleanBufferBuilder,
+}
+
+impl BuildSide {
+    /// Tries to extract BuildSideInitialState from BuildSide enum.
+    /// Returns an error if state is not Initial.
+    fn try_into_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {
+        match self {
+            BuildSide::Initial(state) => Ok(state),
+            _ => Err(DataFusionError::Internal(
+                "Expected build side in initial state".to_string(),
+            )),
+        }
+    }
+
+    /// Tries to extract BuildSideReadyState from BuildSide enum.
+    /// Returns an error if state is not Ready.
+    fn try_into_ready(&self) -> Result<&BuildSideReadyState> {

Review Comment:
   ```suggestion
       fn try_as_ready(&self) -> Result<&BuildSideReadyState> {
   ```



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -789,6 +788,98 @@ where
     Ok(())
 }
 
+/// Represents build-side of hash join.
+enum BuildSide {
+    /// Indicates that build-side not collected yet
+    Initial(BuildSideInitialState),
+    /// Indicates that build-side data has been collected
+    Ready(BuildSideReadyState),
+}
+
+/// Container for BuildSide::Initial related data
+struct BuildSideInitialState {
+    /// Future for building hash table from build-side input
+    left_fut: OnceFut<JoinLeftData>,
+}
+
+/// Container for BuildSide::Ready related data
+struct BuildSideReadyState {
+    /// Collected build-side data
+    left_data: Arc<JoinLeftData>,
+    /// Which build-side rows have been matched while creating output.
+    /// For some OUTER joins, we need to know which rows have not been matched
+    /// to produce the correct output.
+    visited_left_side: BooleanBufferBuilder,
+}
+
+impl BuildSide {
+    /// Tries to extract BuildSideInitialState from BuildSide enum.
+    /// Returns an error if state is not Initial.
+    fn try_into_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {

Review Comment:
   This would also be consistent with what you did with  `HashJoinStreamState` 
below



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -1064,149 +1149,234 @@ pub fn equal_rows_arr(
 
 impl HashJoinStream {
     /// Separate implementation function that unpins the [`HashJoinStream`] so
-    /// that partial borrows work correctly
+    /// that partial borrows work correctly.
+    ///
+    /// Performs state transitions for HashJoinStream according to following 
diagram:

Review Comment:
   This state diagram might be easier to find if it were on 
`HashJoinStreamState` itself



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -789,6 +788,98 @@ where
     Ok(())
 }
 
+/// Represents build-side of hash join.
+enum BuildSide {
+    /// Indicates that build-side not collected yet
+    Initial(BuildSideInitialState),
+    /// Indicates that build-side data has been collected
+    Ready(BuildSideReadyState),
+}
+
+/// Container for BuildSide::Initial related data
+struct BuildSideInitialState {
+    /// Future for building hash table from build-side input
+    left_fut: OnceFut<JoinLeftData>,
+}
+
+/// Container for BuildSide::Ready related data
+struct BuildSideReadyState {
+    /// Collected build-side data
+    left_data: Arc<JoinLeftData>,
+    /// Which build-side rows have been matched while creating output.
+    /// For some OUTER joins, we need to know which rows have not been matched
+    /// to produce the correct output.
+    visited_left_side: BooleanBufferBuilder,
+}
+
+impl BuildSide {
+    /// Tries to extract BuildSideInitialState from BuildSide enum.
+    /// Returns an error if state is not Initial.
+    fn try_into_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {

Review Comment:
   Another name for this that might be more "idomatic" could be to use *as* 
because it doesn't consume
   
   ```suggestion
       fn try_as_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {
   ```
       fn try_as_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {
   ```



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -789,6 +788,98 @@ where
     Ok(())
 }
 
+/// Represents build-side of hash join.
+enum BuildSide {
+    /// Indicates that build-side not collected yet
+    Initial(BuildSideInitialState),
+    /// Indicates that build-side data has been collected
+    Ready(BuildSideReadyState),
+}
+
+/// Container for BuildSide::Initial related data
+struct BuildSideInitialState {
+    /// Future for building hash table from build-side input
+    left_fut: OnceFut<JoinLeftData>,
+}
+
+/// Container for BuildSide::Ready related data
+struct BuildSideReadyState {
+    /// Collected build-side data
+    left_data: Arc<JoinLeftData>,
+    /// Which build-side rows have been matched while creating output.
+    /// For some OUTER joins, we need to know which rows have not been matched
+    /// to produce the correct output.
+    visited_left_side: BooleanBufferBuilder,
+}
+
+impl BuildSide {
+    /// Tries to extract BuildSideInitialState from BuildSide enum.
+    /// Returns an error if state is not Initial.
+    fn try_into_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {
+        match self {
+            BuildSide::Initial(state) => Ok(state),
+            _ => Err(DataFusionError::Internal(
+                "Expected build side in initial state".to_string(),
+            )),
+        }
+    }
+
+    /// Tries to extract BuildSideReadyState from BuildSide enum.
+    /// Returns an error if state is not Ready.
+    fn try_into_ready(&self) -> Result<&BuildSideReadyState> {
+        match self {
+            BuildSide::Ready(state) => Ok(state),
+            _ => Err(DataFusionError::Internal(
+                "Expected build side in ready state".to_string(),
+            )),
+        }
+    }
+
+    /// Tries to extract BuildSideReadyState from BuildSide enum.
+    /// Returns an error if state is not Ready.
+    fn try_into_ready_mut(&mut self) -> Result<&mut BuildSideReadyState> {

Review Comment:
   ```suggestion
       fn try_as_ready_mut(&mut self) -> Result<&mut BuildSideReadyState> {
   ```



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -789,6 +788,98 @@ where
     Ok(())
 }
 
+/// Represents build-side of hash join.
+enum BuildSide {
+    /// Indicates that build-side not collected yet
+    Initial(BuildSideInitialState),
+    /// Indicates that build-side data has been collected
+    Ready(BuildSideReadyState),
+}
+
+/// Container for BuildSide::Initial related data
+struct BuildSideInitialState {
+    /// Future for building hash table from build-side input
+    left_fut: OnceFut<JoinLeftData>,
+}
+
+/// Container for BuildSide::Ready related data
+struct BuildSideReadyState {
+    /// Collected build-side data
+    left_data: Arc<JoinLeftData>,
+    /// Which build-side rows have been matched while creating output.
+    /// For some OUTER joins, we need to know which rows have not been matched
+    /// to produce the correct output.
+    visited_left_side: BooleanBufferBuilder,
+}
+
+impl BuildSide {
+    /// Tries to extract BuildSideInitialState from BuildSide enum.
+    /// Returns an error if state is not Initial.
+    fn try_into_initial_mut(&mut self) -> Result<&mut BuildSideInitialState> {
+        match self {
+            BuildSide::Initial(state) => Ok(state),
+            _ => Err(DataFusionError::Internal(

Review Comment:
   I think you could use the `internal_err!` macro here and in the other errors 
in this PR to make it more concise and consistent with the rest of the 
codebase. Something like
   ```rust
   internal_err!("Expected build side in initial state")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to