[
https://issues.apache.org/jira/browse/SPARK-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266274#comment-15266274
]
Raymond Honderdors commented on SPARK-14946:
--------------------------------------------
version 2.0 query plan:
= Parsed Logical Plan ==
'Project [*]
+- 'Join Inner, Some(('sd.campaignid = 'c.campaign_id))
:- 'UnresolvedRelation `pe_servingdata`, Some(sd)
+- 'UnresolvedRelation `pe_campaigns_gzip`, Some(c)
== Analyzed Logical Plan ==
originaltime: string, pluid: string, sdg: string, type: bigint, useragent:
string, utctime: string, diorigin: string, dbid: string, timeid: string,
browser: string, brandid: bigint, time: string, zip: string, dma: string,
ad_id: int, ismobile: string, privacy: string, df: string, userip: string,
agencyid: bigint, ta: string, mb: string, advertiserid: bigint, campaignid:
bigint, os: string, usr: string, isdefaultimg: string, isuserinit: string,
impressiontype: string, referrer: string, city: string, masteradid: bigint,
state: string, val: string, isclick: string, flightid: bigint, siteid: string,
intrn: string, asset: string, sid: string, account_id: bigint, event_time:
bigint, campaign_id: bigint, campaign_type_id: int, campaign_name: string,
version: int, account_id: bigint
Project
[originaltime#194,pluid#195,sdg#196,type#197L,useragent#198,utctime#199,diorigin#200,dbid#201,timeid#202,browser#203,brandid#204L,time#205,zip#206,dma#207,ad_id#208,ismobile#209,privacy#210,df#211,userip#212,agencyid#213L,ta#214,mb#215,advertiserid#216L,campaignid#217L,os#218,usr#219,isdefaultimg#220,isuserinit#221,impressiontype#222,referrer#223,city#224,masteradid#225L,state#226,val#227,isclick#228,flightid#229L,siteid#230,intrn#231,asset#232,sid#233,account_id#192L,event_time#193L,campaign_id#235L,campaign_type_id#236,campaign_name#237,version#238,account_id#234L]
+- Join Inner, Some((campaignid#217L = campaign_id#235L))
:- SubqueryAlias sd
: +-
Relation[originaltime#194,pluid#195,sdg#196,type#197L,useragent#198,utctime#199,diorigin#200,dbid#201,timeid#202,browser#203,brandid#204L,time#205,zip#206,dma#207,ad_id#208,ismobile#209,privacy#210,df#211,userip#212,agencyid#213L,ta#214,mb#215,advertiserid#216L,campaignid#217L,os#218,usr#219,isdefaultimg#220,isuserinit#221,impressiontype#222,referrer#223,city#224,masteradid#225L,state#226,val#227,isclick#228,flightid#229L,siteid#230,intrn#231,asset#232,sid#233,account_id#192L,event_time#193L]
HadoopFiles
+- SubqueryAlias c
+-
Relation[campaign_id#235L,campaign_type_id#236,campaign_name#237,version#238,account_id#234L]
HadoopFiles
== Optimized Logical Plan ==
Join Inner, Some((campaignid#217L = campaign_id#235L))
:- Filter isnotnull(campaignid#217L)
: +-
Relation[originaltime#194,pluid#195,sdg#196,type#197L,useragent#198,utctime#199,diorigin#200,dbid#201,timeid#202,browser#203,brandid#204L,time#205,zip#206,dma#207,ad_id#208,ismobile#209,privacy#210,df#211,userip#212,agencyid#213L,ta#214,mb#215,advertiserid#216L,campaignid#217L,os#218,usr#219,isdefaultimg#220,isuserinit#221,impressiontype#222,referrer#223,city#224,masteradid#225L,state#226,val#227,isclick#228,flightid#229L,siteid#230,intrn#231,asset#232,sid#233,account_id#192L,event_time#193L]
HadoopFiles
+- Filter isnotnull(campaign_id#235L)
+-
Relation[campaign_id#235L,campaign_type_id#236,campaign_name#237,version#238,account_id#234L]
HadoopFiles
== Physical Plan ==
WholeStageCodegen
: +- BroadcastHashJoin [campaignid#217L], [campaign_id#235L], Inner,
BuildRight, None
: :- Project
[originaltime#194,pluid#195,sdg#196,type#197L,useragent#198,utctime#199,diorigin#200,dbid#201,timeid#202,browser#203,brandid#204L,time#205,zip#206,dma#207,ad_id#208,ismobile#209,privacy#210,df#211,userip#212,agencyid#213L,ta#214,mb#215,advertiserid#216L,campaignid#217L,os#218,usr#219,isdefaultimg#220,isuserinit#221,impressiontype#222,referrer#223,city#224,masteradid#225L,state#226,val#227,isclick#228,flightid#229L,siteid#230,intrn#231,asset#232,sid#233,account_id#192L,event_time#193L]
: : +- Filter isnotnull(campaignid#217L)
: : +- BatchedScan
HadoopFiles[originaltime#194,pluid#195,sdg#196,type#197L,useragent#198,utctime#199,diorigin#200,dbid#201,timeid#202,browser#203,brandid#204L,time#205,zip#206,dma#207,ad_id#208,ismobile#209,privacy#210,df#211,userip#212,agencyid#213L,ta#214,mb#215,advertiserid#216L,campaignid#217L,os#218,usr#219,isdefaultimg#220,isuserinit#221,impressiontype#222,referrer#223,city#224,masteradid#225L,state#226,val#227,isclick#228,flightid#229L,siteid#230,intrn#231,asset#232,sid#233,account_id#192L,event_time#193L]
Format: ParquetFormat, PushedFilters: [IsNotNull(campaignid)], ReadSchema:
struct<originaltime:string,pluid:string,sdg:string,type:bigint,useragent:string,utctime:string,diorigin:string,dbid:string,timeid:string,browser:string,brandid:bigint,time:string,zip:string,dma:string,ad_id:int,ismobile:string,privacy:string,df:string,userip:string,agencyid:bigint,ta:string,mb:string,advertiserid:bigint,campaignid:bigint,os:string,usr:string,isdefaultimg:string,isuserinit:string,impressiontype:string,referrer:string,city:string,masteradid:bigint,state:string,val:string,isclick:string,flightid:bigint,siteid:string,intrn:string,asset:string,sid:string>
: +- INPUT
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint]))
+- WholeStageCodegen
: +- Project
[campaign_id#235L,campaign_type_id#236,campaign_name#237,version#238,account_id#234L]
: +- Filter isnotnull(campaign_id#235L)
: +- BatchedScan
HadoopFiles[campaign_id#235L,campaign_type_id#236,campaign_name#237,version#238,account_id#234L]
Format: ParquetFormat, PushedFilters: [IsNotNull(campaign_id)], ReadSchema:
struct<campaign_id:bigint,campaign_type_id:int,campaign_name:string,version:int>
> Spark 2.0 vs 1.6.1 Query Time(out)
> ----------------------------------
>
> Key: SPARK-14946
> URL: https://issues.apache.org/jira/browse/SPARK-14946
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Raymond Honderdors
> Priority: Critical
> Attachments: Query Plan 1.6.1.png, screenshot-spark_2.0.png,
> spark-defaults.conf, spark-env.sh
>
>
> I run a query using JDBC driver running it on version 1.6.1 it return after 5
> – 6 min , the same query against version 2.0 fails after 2h (due to timeout)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]