[ 
https://issues.apache.org/jira/browse/SPARK-25357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-25357:
-------------------------------
    Description: 
Field {{metadata}} removed from {{SparkPlanInfo}} in SPARK-17701. 
Corresponding, this field was also removed from event 
{{SparkListenerSQLExecutionStart}} in Spark event log. If we want to analyze 
event log to get some fields which wider than 100 (e.g the Location or 
ReadSchema of FileScan), they are abbreviated in {{simpleString}} of 
SparkPlanInfo JSON or {{physicalPlanDescription}} JSON.

Before 2.3, the fragment of SparkListenerSQLExecutionStart in event log (It 
contains the metadata field):
{quote}Location: 
InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las...","children":[],"metadata":{"Location":"InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt=20180904]","ReadSchema":"struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_last_name:string,isg_name:string,acct_isg_stat_desc:string,prmry_user_slctd_id:string,prmry_orcl_id:bigint,acct_cmpny_bsns_lcns_num:string,acct_slctd_id:string,acct_orcl_id:bigint,acct_cmpny_name:string,acct_cmpny_region_txt:string,acct_cmpny_prvnc_txt:string,acct_cmpny_addr_txt:string,acct_type_seg:string,p4_acct_ind:tinyint,i320_acct_ind:tinyint,i463_acct_ind:tinyint,i319_acct_ind:tinyint,acct_cntry:string,acct_stat:string,acct_club_ind:string,acct_src_bd_name:string,acct_prmry_bsns_vrtcl_desc:string,acct_minor_bsns_vrtcl_desc:string,acct_src_desc:string,acct_pre_ams_id:bigint,src_last_mdfd_dt:date,src_last_mdfd_tm:string,CRE_DATE:date,CRE_USER:string,UPD_DATE:timestamp,UPD_USER:string>"
{quote}

I suggest to keep intact value in simpleString in DataSourceScanExec to fix it. 
Intact information in event log is very useful for offline job analysis.

  was:
Field {{metadata}} removed from {{SparkPlanInfo}} in SPARK-17701. 
Corresponding, this field was also removed from event 
{{SparkListenerSQLExecutionStart}} in Spark event log. If we want to analyze 
event log to get some fields which wider than 100 (e.g the Location or 
ReadSchema of FileScan), they are abbreviated in {{simpleString}} of 
SparkPlanInfo JSON or {{physicalPlanDescription}} JSON.

Before 2.3, the fragment of SparkListenerSQLExecutionStart in event log (It 
contains the metadata field):
{quote}Location: 
InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
 PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las...","children":[],"metadata":{"Location":"InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt=20180904]","ReadSchema":"struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_last_name:string,isg_name:string,acct_isg_stat_desc:string,prmry_user_slctd_id:string,prmry_orcl_id:bigint,acct_cmpny_bsns_lcns_num:string,acct_slctd_id:string,acct_orcl_id:bigint,acct_cmpny_name:string,acct_cmpny_region_txt:string,acct_cmpny_prvnc_txt:string,acct_cmpny_addr_txt:string,acct_type_seg:string,p4_acct_ind:tinyint,i320_acct_ind:tinyint,i463_acct_ind:tinyint,i319_acct_ind:tinyint,acct_cntry:string,acct_stat:string,acct_club_ind:string,acct_src_bd_name:string,acct_prmry_bsns_vrtcl_desc:string,acct_minor_bsns_vrtcl_desc:string,acct_src_desc:string,acct_pre_ams_id:bigint,src_last_mdfd_dt:date,src_last_mdfd_tm:string,CRE_DATE:date,CRE_USER:string,UPD_DATE:timestamp,UPD_USER:string>"
{quote}
>From 2.3, the fragment of SparkListenerSQLExecutionStart in event log (Many 
>information wider than 100 are abbreviated):
{quote}Location: 
InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
 ReadSchema: 
struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las..."
{quote}


> Abbreviated metadata in DataSourceScanExec results in incomplete information 
> in event log
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-25357
>                 URL: https://issues.apache.org/jira/browse/SPARK-25357
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Lantao Jin
>            Priority: Minor
>
> Field {{metadata}} removed from {{SparkPlanInfo}} in SPARK-17701. 
> Corresponding, this field was also removed from event 
> {{SparkListenerSQLExecutionStart}} in Spark event log. If we want to analyze 
> event log to get some fields which wider than 100 (e.g the Location or 
> ReadSchema of FileScan), they are abbreviated in {{simpleString}} of 
> SparkPlanInfo JSON or {{physicalPlanDescription}} JSON.
> Before 2.3, the fragment of SparkListenerSQLExecutionStart in event log (It 
> contains the metadata field):
> {quote}Location: 
> InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_las...","children":[],"metadata":{"Location":"InMemoryFileIndex[hdfs://hercules/sys/edw/prs_idm/idm_cbt_am_t/cbt/cbt_acct_prfl_info/snapshot/dt=20180904]","ReadSchema":"struct<snpsht_start_dt:date,snpsht_end_dt:date,am_ntlogin_name:string,am_first_name:string,am_last_name:string,isg_name:string,acct_isg_stat_desc:string,prmry_user_slctd_id:string,prmry_orcl_id:bigint,acct_cmpny_bsns_lcns_num:string,acct_slctd_id:string,acct_orcl_id:bigint,acct_cmpny_name:string,acct_cmpny_region_txt:string,acct_cmpny_prvnc_txt:string,acct_cmpny_addr_txt:string,acct_type_seg:string,p4_acct_ind:tinyint,i320_acct_ind:tinyint,i463_acct_ind:tinyint,i319_acct_ind:tinyint,acct_cntry:string,acct_stat:string,acct_club_ind:string,acct_src_bd_name:string,acct_prmry_bsns_vrtcl_desc:string,acct_minor_bsns_vrtcl_desc:string,acct_src_desc:string,acct_pre_ams_id:bigint,src_last_mdfd_dt:date,src_last_mdfd_tm:string,CRE_DATE:date,CRE_USER:string,UPD_DATE:timestamp,UPD_USER:string>"
> {quote}
> I suggest to keep intact value in simpleString in DataSourceScanExec to fix 
> it. Intact information in event log is very useful for offline job analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to