Gaurav Tiwari created SPARK-14097:
-------------------------------------
Summary: Spark SQL Optimization is not consistent
Key: SPARK-14097
URL: https://issues.apache.org/jira/browse/SPARK-14097
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.6.1, 1.5.2
Reporter: Gaurav Tiwari
Priority: Minor
I am trying to execute a simple query with join on 3 tables. When I look at the
execution plan , it varies with position of table in the "from" clause of the
query. Execution plan looks more optimized when the position of table with
predicates is specified before any other table.
Original query :
select distinct pge.portfolio_code
from table1 pge join table2 p
on p.perm_group = pge.anc_port_group
join table3 uge
on p.user_group=uge.anc_user_group
where uge.user_name = 'user' and p.perm_type = 'TEST'
Execution Plan for original query:
== Physical Plan ==
TungstenAggregate(key=[portfolio_code#14119], functions=[],
output=[portfolio_code#14119])
TungstenExchange hashpartitioning(portfolio_code#14119)
TungstenAggregate(key=[portfolio_code#14119], functions=[],
output=[portfolio_code#14119])
TungstenProject [portfolio_code#14119]
BroadcastHashJoin [user_group#13665], [anc_user_group#13658], BuildRight
TungstenProject [portfolio_code#14119,user_group#13665]
BroadcastHashJoin [anc_port_group#14117], [perm_group#13667], BuildRight
ConvertToUnsafe
Scan
ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table1.parquet][portfolio_code#14119,anc_port_group#14117]
ConvertToUnsafe
Project [user_group#13665,perm_group#13667]
Filter (perm_type#13666 = TEST)
Scan
ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table2.parquet][user_group#13665,perm_group#13667,perm_type#13666]
ConvertToUnsafe
Project [anc_user_group#13658]
Filter (user_name#13659 = user)
Scan
ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table3.parquet][anc_user_group#13658,user_name#13659]
Optimized query (table with predicates is moved ahead):
select distinct pge.portfolio_code
from table1 uge, table2 p, table3 pge
where uge.user_name = 'user' and p.perm_type = 'TEST'
and p.perm_group = pge.anc_port_group
and p.user_group=uge.anc_user_group
Execution Plan for optimized query:
== Physical Plan ==
TungstenAggregate(key=[portfolio_code#14119], functions=[],
output=[portfolio_code#14119])
TungstenExchange hashpartitioning(portfolio_code#14119)
TungstenAggregate(key=[portfolio_code#14119], functions=[],
output=[portfolio_code#14119])
TungstenProject [portfolio_code#14119]
BroadcastHashJoin [perm_group#13667], [anc_port_group#14117], BuildRight
TungstenProject [perm_group#13667]
BroadcastHashJoin [anc_user_group#13658], [user_group#13665], BuildRight
ConvertToUnsafe
Project [anc_user_group#13658]
Filter (user_name#13659 = user)
Scan
ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table3.parquet][anc_user_group#13658,user_name#13659]
ConvertToUnsafe
Project [perm_group#13667,user_group#13665]
Filter (perm_type#13666 = TEST)
Scan
ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table2.parquet][perm_group#13667,user_group#13665,perm_type#13666]
ConvertToUnsafe
Scan
ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table1.parquet][portfolio_code#14119,anc_port_group#14117]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]