[
https://issues.apache.org/jira/browse/BEAM-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Kedin updated BEAM-5049:
------------------------------
Description:
The query like this:
{code}
SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON a.some_id
= c.some_id;
{code}
results in two shuffles. Can probably be optimized.
Relevant code:
- BeamJoinRel implements Join in SQL:
https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194
- CoGBK Join implementation:
https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36
was:
The query like this:
{code}
SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id
= c.user_id;
{code}
results in two shuffles. Can probably be optimized.
Relevant code:
- BeamJoinRel implements Join in SQL:
https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194
- CoGBK Join implementation:
https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36
> [SQL] Batch Join results in two shuffles
> ----------------------------------------
>
> Key: BEAM-5049
> URL: https://issues.apache.org/jira/browse/BEAM-5049
> Project: Beam
> Issue Type: Bug
> Components: dsl-sql
> Reporter: Anton Kedin
> Priority: Major
>
> The query like this:
> {code}
> SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON
> a.some_id = c.some_id;
> {code}
> results in two shuffles. Can probably be optimized.
> Relevant code:
> - BeamJoinRel implements Join in SQL:
> https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194
> - CoGBK Join implementation:
> https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
