[
https://issues.apache.org/jira/browse/SPARK-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-16093:
---------------------------------
Labels: bulk-closed (was: )
> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> --------------------------------------------------------------------------
>
> Key: SPARK-16093
> URL: https://issues.apache.org/jira/browse/SPARK-16093
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: marymwu
> Priority: Major
> Labels: bulk-closed
> Attachments: Errorlog.txt
>
>
> Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
> Precondition:
> set spark.sql.autoBroadcastJoinThreshold = 1;
> Testcase:
> "INSERT OVERWRITE TABLE
> RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION
> (p_event_date='2016-06-18')
> select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from
> (select app_key,app_channel,lps_did from
> RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a
> join
> (select app_key,lps_did,device_model, count(1) as visits from
> RPS__H_REPORT_MORE_DIMENSION_SMALL where p_event_date = '2016-06-18'
> and ( log_type=1 or log_type=2)
> group by app_key,lps_did,device_model) b
> on a.lps_did = b.lps_did and a.app_key=b.app_key
> group by a.app_key,a.app_channel,b.device_model;
> "
> == Physical Plan ==
> InsertIntoHiveTable MetastoreRelation default,
> rps__h_report_more_dimension_first_channel_visit_cd_day, None,
> Map(p_event_date -> Some(2016-06-18)), true, false
> +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20],
> functions=[(sum(visits#3L),mode=Final,isDistinct=false)],
> output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L])
> +- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7,
> app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle
> partition size: 500000000])
> +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20],
> functions=[(sum(visits#3L),mode=Partial,isDistinct=false)],
> output=[app_key#7,app_channel#9,device_model#20,sum#41L])
> +- Project [app_key#7,app_channel#9,device_model#20,visits#3L]
> +- BroadcastHashJoin [lps_did#8,app_key#7],
> [lps_did#13,app_key#12], Inner, BuildRight, None
> :- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8))
> : +- HiveTableScan [app_key#7,app_channel#9,lps_did#8],
> MetastoreRelation default,
> rps__h_report_more_dimension_earliest_newuser_list_c, None
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[1,
> string], input[0, string]))
> +-
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20],
> functions=[(count(1),mode=Final,isDistinct=false)],
> output=[app_key#12,lps_did#13,device_model#20,visits#3L])
> +- Exchange(coordinator id: 733045095)
> hashpartitioning(app_key#12, lps_did#13, device_model#20, 600),
> Some(coordinator[target post-shuffle partition size: 500000000])
> +-
> TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20],
> functions=[(count(1),mode=Partial,isDistinct=false)],
> output=[app_key#12,lps_did#13,device_model#20,count#39L])
> +- Project [app_key#12,lps_did#13,device_model#20]
> +- Filter ((isnotnull(app_key#12) &&
> isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) ||
> (cast(log_type#11 as int) = 2)))
> +- HiveTableScan
> [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation
> default, rps__h_report_more_dimension_small, None,
> [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)]
> Time taken: 4.775 seconds, Fetched 1 row(s)
> 16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s)
> Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12],
> Inner, BuildRight, None
> Result:
> 1. Execution failed, spark service is unavailable.
> 2. Even though set spark.sql.autoBroadcastJoinThreshold = 1,
> BroadcastHashJoin has been used when join two large tables.
> Error log is as attached.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]