liukun4515 commented on code in PR #5156:
URL: https://github.com/apache/arrow-datafusion/pull/5156#discussion_r1100068420
##########
datafusion/core/src/physical_plan/joins/nested_loop_join.rs:
##########
@@ -304,6 +280,14 @@ async fn load_left_specified_partition(
Ok(merged_batch)
}
+// BuildLeft means the left relation is the single patrition side.
+// For full join, both side are single partition, so it is BuildLeft and
BuildRight, treat it as BuildLeft.
+pub fn left_is_build_side(join_type: JoinType) -> bool {
Review Comment:
> * We can optimize the data type of join_indices(not do in this pr). The
indices type of inner-table should be `UInt64Array`, and `UInt32Array` for the
outer-table. Currently left table is `UInt64Array`, and right table is
`UInt32Array`.
Hash join has the same issue.
Before this PR, the NLJ and hash join share some methods in the `utils`,
such as `need_produce_result_in_final`.
But this PR breaks this, I want to refactor them, Do you have any ideas
about that?
In the hash-join, we always build left table and cache the left table(single
partition or multi partition), and the right data will be visited iteratively
with the `INT32 ARRAY` as the index.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]