[
https://issues.apache.org/jira/browse/SPARK-14097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208910#comment-15208910
]
Xiao Li commented on SPARK-14097:
---------------------------------
Could you do me a favor by using `explain(true)`? Thanks!
> Spark SQL Optimization is not consistent
> ----------------------------------------
>
> Key: SPARK-14097
> URL: https://issues.apache.org/jira/browse/SPARK-14097
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2, 1.6.1
> Reporter: Gaurav Tiwari
> Priority: Minor
>
> I am trying to execute a simple query with join on 3 tables. When I look at
> the execution plan , it varies with position of table in the "from" clause of
> the query. Execution plan looks more optimized when the position of table
> with predicates is specified before any other table.
> Original query :
> select distinct pge.portfolio_code
> from table1 pge join table2 p
> on p.perm_group = pge.anc_port_group
> join table3 uge
> on p.user_group=uge.anc_user_group
> where uge.user_name = 'user' and p.perm_type = 'TEST'
> Execution Plan for original query:
> == Physical Plan ==
> TungstenAggregate(key=[portfolio_code#14119], functions=[],
> output=[portfolio_code#14119])
> TungstenExchange hashpartitioning(portfolio_code#14119)
> TungstenAggregate(key=[portfolio_code#14119], functions=[],
> output=[portfolio_code#14119])
> TungstenProject [portfolio_code#14119]
> BroadcastHashJoin [user_group#13665], [anc_user_group#13658], BuildRight
> TungstenProject [portfolio_code#14119,user_group#13665]
> BroadcastHashJoin [anc_port_group#14117], [perm_group#13667],
> BuildRight
> ConvertToUnsafe
> Scan
> ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table1.parquet][portfolio_code#14119,anc_port_group#14117]
>
> ConvertToUnsafe
> Project [user_group#13665,perm_group#13667]
> Filter (perm_type#13666 = TEST)
> Scan
> ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table2.parquet][user_group#13665,perm_group#13667,perm_type#13666]
>
> ConvertToUnsafe
> Project [anc_user_group#13658]
> Filter (user_name#13659 = user)
> Scan
> ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table3.parquet][anc_user_group#13658,user_name#13659]
>
> Optimized query (table with predicates is moved ahead):
> select distinct pge.portfolio_code
> from table1 uge, table2 p, table3 pge
> where uge.user_name = 'user' and p.perm_type = 'TEST'
> and p.perm_group = pge.anc_port_group
> and p.user_group=uge.anc_user_group
> Execution Plan for optimized query:
> == Physical Plan ==
> TungstenAggregate(key=[portfolio_code#14119], functions=[],
> output=[portfolio_code#14119])
> TungstenExchange hashpartitioning(portfolio_code#14119)
> TungstenAggregate(key=[portfolio_code#14119], functions=[],
> output=[portfolio_code#14119])
> TungstenProject [portfolio_code#14119]
> BroadcastHashJoin [perm_group#13667], [anc_port_group#14117], BuildRight
> TungstenProject [perm_group#13667]
> BroadcastHashJoin [anc_user_group#13658], [user_group#13665],
> BuildRight
> ConvertToUnsafe
> Project [anc_user_group#13658]
> Filter (user_name#13659 = user)
> Scan
> ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table3.parquet][anc_user_group#13658,user_name#13659]
>
> ConvertToUnsafe
> Project [perm_group#13667,user_group#13665]
> Filter (perm_type#13666 = TEST)
> Scan
> ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table2.parquet][perm_group#13667,user_group#13665,perm_type#13666]
>
> ConvertToUnsafe
> Scan
> ParquetRelation[snackfs://shared:9042/aladdin_data_beta/table1.parquet][portfolio_code#14119,anc_port_group#14117]
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]