[ https://issues.apache.org/jira/browse/IGNITE-20383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Plekhanov updated IGNITE-20383: --------------------------------------- Description: Sometimes, if join inputs are not collocated it's worth to broadcast one of the inputs, for example, query: {code:sql} SELECT * FROM emps WHERE emps.salary = (SELECT AVG(emps.salary) FROM emps){code} Currently has plan: {noformat} IgniteProject(ID=[$0], NAME=[$1], SALARY=[$2]) IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner]) IgniteExchange(distribution=[single]) IgniteTableScan(table=[[PUBLIC, EMPS]]) IgniteReduceHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteExchange(distribution=[single]) IgniteMapHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[TST], requiredColumns=[{2}], collation=[[2 ASC-nulls-first]]) {noformat} But this plan is not optimal, since we should send entire table EMP from all nodes to the single node. For such a query it's better to broadcast result of the aggregation, in this case plan will be something like: {noformat} IgniteExchange(distribution=[single]) IgniteProject(...) IgniteCorrelatedNestedLoopJoin(...) IgniteExchange(distribution=[broadcast]) IgniteReduceHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteExchange(distribution=[single]) IgniteMapHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[SALARY_IDX]) IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[SALARY_IDX]) {noformat} But currently we don't try to convert any of the join inputs to the broadcast distribution. We should try to do this. was: Sometimes, if join inputs are not collocated it's worth to broadcast one of the inputs, for example, query: {code:sql} SELECT * FROM emps WHERE emps.salary = (SELECT AVG(emps.salary) FROM emps){code} Currently has plan: {noformat} IgniteProject(ID=[$0], NAME=[$1], SALARY=[$2]) IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner]) IgniteExchange(distribution=[single]) IgniteTableScan(table=[[PUBLIC, EMPS]]) IgniteReduceHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteExchange(distribution=[single]) IgniteMapHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[TST], requiredColumns=[{2}], collation=[[2 ASC-nulls-first]]) {noformat} But this plan is not optimal, since we should send entire table EMP from all nodes to the single node. For such a query it's better to broadcast result of the aggregation, in this case plan will be something like: {noformat} IgniteExchange(distribution=[single]) IgniteProject(ID=[$0], NAME=[$1], SALARY=[$2]) IgniteCorrelatedNestedLoopJoin(...) IgniteExchange(distribution=[broadcast]) IgniteReduceHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteExchange(distribution=[single]) IgniteMapHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[SALARY_IDX]) IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[SALARY_IDX]) {noformat} But currently we don't try to convert any of the join inputs to the broadcast distribution. We should try to do this. > Calcite engine. Convert one input of a join to the broadcast distribution > ------------------------------------------------------------------------- > > Key: IGNITE-20383 > URL: https://issues.apache.org/jira/browse/IGNITE-20383 > Project: Ignite > Issue Type: Improvement > Reporter: Aleksey Plekhanov > Assignee: Aleksey Plekhanov > Priority: Major > Labels: calcite, ise > > Sometimes, if join inputs are not collocated it's worth to broadcast one of > the inputs, for example, query: > {code:sql} > SELECT * FROM emps WHERE emps.salary = (SELECT AVG(emps.salary) FROM > emps){code} > Currently has plan: > {noformat} > IgniteProject(ID=[$0], NAME=[$1], SALARY=[$2]) > IgniteNestedLoopJoin(condition=[=($2, $3)], joinType=[inner]) > IgniteExchange(distribution=[single]) > IgniteTableScan(table=[[PUBLIC, EMPS]]) > IgniteReduceHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) > IgniteExchange(distribution=[single]) > IgniteMapHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) > IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[TST], > requiredColumns=[{2}], collation=[[2 ASC-nulls-first]]) > {noformat} > But this plan is not optimal, since we should send entire table EMP from all > nodes to the single node. For such a query it's better to broadcast result of > the aggregation, in this case plan will be something like: > {noformat} > IgniteExchange(distribution=[single]) > IgniteProject(...) > IgniteCorrelatedNestedLoopJoin(...) > IgniteExchange(distribution=[broadcast]) > IgniteReduceHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) > IgniteExchange(distribution=[single]) > IgniteMapHashAggregate(group=[{}], AVG(EMPS.SALARY)=[AVG($0)]) > IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[SALARY_IDX]) > IgniteIndexScan(table=[[PUBLIC, EMPS]], index=[SALARY_IDX]) > {noformat} > But currently we don't try to convert any of the join inputs to the broadcast > distribution. We should try to do this. -- This message was sent by Atlassian Jira (v8.20.10#820010)