[
https://issues.apache.org/jira/browse/SPARK-25357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lantao Jin updated SPARK-25357:
-------------------------------
Description:
Field {{metadata}} removed from {{SparkPlanInfo}} in SPARK-17701.
Corresponding, this field was also removed from event
{{SparkListenerSQLExecutionStart}} in Spark event log. If we want to analyze
event log to get some fields which wider than 100 (e.g the Location or
ReadSchema of FileScan), they are abbreviated in {{simpleString}} of
SparkPlanInfo JSON or {{physicalPlanDescription}} JSON.
Before 2.3, the fragment of SparkListenerSQLExecutionStart in event log (It
contains the metadata field):
{quote}Location:
InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las...","children":[],"metadata":{"Location":"InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt=20180904]","ReadSchema":"struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_last_name:string,isg_name:string,acct_isg_stat_desc:string,prmry_user_slctd_id:string,prmry_orcl_id:bigint,acct_cmpny_bsns_lcns_num:string,acct_slctd_id:string,acct_orcl_id:bigint,acct_cmpny_name:string,acct_cmpny_region_txt:string,acct_cmpny_prvnc_txt:string,acct_cmpny_addr_txt:string,acct_type_seg:string,p4_acct_ind:tinyint,i320_acct_ind:tinyint,i463_acct_ind:tinyint,i319_acct_ind:tinyint,acct_cntry:string,acct_stat:string,acct_club_ind:string,acct_src_bd_name:string,acct_prmry_bsns_vrtcl_desc:string,acct_minor_bsns_vrtcl_desc:string,acct_src_desc:string,acct_pre_ams_id:bigint,src_last_mdfd_dt:date,src_last_mdfd_tm:string,CRE_DATE:date,CRE_USER:string,UPD_DATE:timestamp,UPD_USER:string>"
{quote}
So I add this field back to SparkPlanInfo class. Then it will log out the meta
data to event log. Intact information in event log is very useful for offline
job analysis.
was:
Field {{metadata}} removed from {{SparkPlanInfo}} in SPARK-17701.
Corresponding, this field was also removed from event
{{SparkListenerSQLExecutionStart}} in Spark event log. If we want to analyze
event log to get some fields which wider than 100 (e.g the Location or
ReadSchema of FileScan), they are abbreviated in {{simpleString}} of
SparkPlanInfo JSON or {{physicalPlanDescription}} JSON.
Before 2.3, the fragment of SparkListenerSQLExecutionStart in event log (It
contains the metadata field):
{quote}Location:
InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
PartitionFilters: [], PushedFilters: [], ReadSchema:
struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las...","children":[],"metadata":{"Location":"InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt=20180904]","ReadSchema":"struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_last_name:string,isg_name:string,acct_isg_stat_desc:string,prmry_user_slctd_id:string,prmry_orcl_id:bigint,acct_cmpny_bsns_lcns_num:string,acct_slctd_id:string,acct_orcl_id:bigint,acct_cmpny_name:string,acct_cmpny_region_txt:string,acct_cmpny_prvnc_txt:string,acct_cmpny_addr_txt:string,acct_type_seg:string,p4_acct_ind:tinyint,i320_acct_ind:tinyint,i463_acct_ind:tinyint,i319_acct_ind:tinyint,acct_cntry:string,acct_stat:string,acct_club_ind:string,acct_src_bd_name:string,acct_prmry_bsns_vrtcl_desc:string,acct_minor_bsns_vrtcl_desc:string,acct_src_desc:string,acct_pre_ams_id:bigint,src_last_mdfd_dt:date,src_last_mdfd_tm:string,CRE_DATE:date,CRE_USER:string,UPD_DATE:timestamp,UPD_USER:string>"
{quote}
I suggest to keep intact value in simpleString in DataSourceScanExec to fix it.
Intact information in event log is very useful for offline job analysis.
> Add metadata to SparkPlanInfo to dump more information like file path to
> event log
> ----------------------------------------------------------------------------------
>
> Key: SPARK-25357
> URL: https://issues.apache.org/jira/browse/SPARK-25357
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.1
> Reporter: Lantao Jin
> Priority: Minor
>
> Field {{metadata}} removed from {{SparkPlanInfo}} in SPARK-17701.
> Corresponding, this field was also removed from event
> {{SparkListenerSQLExecutionStart}} in Spark event log. If we want to analyze
> event log to get some fields which wider than 100 (e.g the Location or
> ReadSchema of FileScan), they are abbreviated in {{simpleString}} of
> SparkPlanInfo JSON or {{physicalPlanDescription}} JSON.
> Before 2.3, the fragment of SparkListenerSQLExecutionStart in event log (It
> contains the metadata field):
> {quote}Location:
> InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
> PartitionFilters: [], PushedFilters: [], ReadSchema:
> struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las...","children":[],"metadata":{"Location":"InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt=20180904]","ReadSchema":"struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_last_name:string,isg_name:string,acct_isg_stat_desc:string,prmry_user_slctd_id:string,prmry_orcl_id:bigint,acct_cmpny_bsns_lcns_num:string,acct_slctd_id:string,acct_orcl_id:bigint,acct_cmpny_name:string,acct_cmpny_region_txt:string,acct_cmpny_prvnc_txt:string,acct_cmpny_addr_txt:string,acct_type_seg:string,p4_acct_ind:tinyint,i320_acct_ind:tinyint,i463_acct_ind:tinyint,i319_acct_ind:tinyint,acct_cntry:string,acct_stat:string,acct_club_ind:string,acct_src_bd_name:string,acct_prmry_bsns_vrtcl_desc:string,acct_minor_bsns_vrtcl_desc:string,acct_src_desc:string,acct_pre_ams_id:bigint,src_last_mdfd_dt:date,src_last_mdfd_tm:string,CRE_DATE:date,CRE_USER:string,UPD_DATE:timestamp,UPD_USER:string>"
> {quote}
> So I add this field back to SparkPlanInfo class. Then it will log out the
> meta data to event log. Intact information in event log is very useful for
> offline job analysis.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]