[
https://issues.apache.org/jira/browse/ARROW-15590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530201#comment-17530201
]
Weston Pace commented on ARROW-15590:
-------------------------------------
The Substrait spec (the website) doesn't always match the .proto yet. This is
not a great thing but it's a work in progress. Feel free to open some PRs
against the site if you want. In the meantime I find it easier to work with
the proto:
```
message JoinRel {
RelCommon common = 1;
Rel left = 2;
Rel right = 3;
Expression expression = 4;
Expression post_join_filter = 5;
JoinType type = 6;
...
}
```
The {{post_join_filter}} is not on the site today and should match
{{HashJoinNodeOptions::filter}}.
{{LeftInput}} and {{RightInput}} correspond to the inputs specified when adding
a join to a plan and so they aren't in {{HashJoinNodeOptions}}:
```
MakeExecNode("hashjoin", plan.get(), {LeftInput, RightInput}, join_options));
```
You are correct that we do not handle expressions in general for the join
condition. So I think the best thing to do here initially is restrict the set
of allowed plans. If the expression is not a call then reject it. If the
expression is a call then it must be one of two functions, "equal" or
"is_not_distinct_from". In either case the function has two arguments. Both
arguments must be a {{FieldReference}}. We can convert from a Substrait
{{FieldReference}} to an Arrow {{FieldRef}} and so that will give you left keys
and right keys. There is an Arrow options {{HashJoinNodeOptions::key_cmp}}.
If the Substrait function is "equal" then use {{JoinKeyCmp::Eq}}. If the
Substrait function is "is_not_distinct_from" then use {{JoinKeyCmp::Is}}.
With the above approach you will always have exactly one left key, one right
key, and one join type.
Later (could be in this PR or a follow-up) we can also handle expressions that
are an and'ed set of equality expressions:
{noformat}
and(equal(field(3),field(5)), equal(field(1),field(7)), equal(field(2),
field(12)))
{noformat}
In this case the number of keys/join types you have would depend on the number
of equality expressions in the and (3 in the above example).
> [C++] Add support for joins to the Substrait consumer
> -----------------------------------------------------
>
> Key: ARROW-15590
> URL: https://issues.apache.org/jira/browse/ARROW-15590
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: substrait
>
> The streaming execution engine supports joins. The Substrait consumer does
> not currently consume joins. We should add support for this. We may want to
> split this PR into subtasks as there are many different kinds of joins and we
> may not support all of them immediately.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)