[ https://issues.apache.org/jira/browse/SPARK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arpan Bhandari updated SPARK-34344: ----------------------------------- Description: We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible. For example : if i run a query using spark shell : spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id limit 100").show(); When i see the event logs or the history server i don't see the query anywhere, but the query plan is there, so it becomes difficult to trace back what query actually got submitted. (if have to map it to the specific application Id on yarn) was: We need to have Application Id from resource manager mapped to the specific spark sql query that got executed with respect to that application Id so that back tracing is possible. For example : if i run a query using spark shell : spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id limit 100").show(); When i see the event logs or the history server i don't see the query anywhere, but the query plan is there, so it becomes difficult to trace back what query actually got submitted. > Have functionality to trace back Spark SQL queries from the application ID > that got submitted on YARN > ----------------------------------------------------------------------------------------------------- > > Key: SPARK-34344 > URL: https://issues.apache.org/jira/browse/SPARK-34344 > Project: Spark > Issue Type: New Feature > Components: Spark Shell, Spark Submit > Affects Versions: 1.6.3, 2.3.0, 2.4.5 > Reporter: Arpan Bhandari > Priority: Major > > We need to have Application Id from resource manager mapped to the specific > spark sql query that got executed with respect to that application Id so that > back tracing is possible. > For example : if i run a query using spark shell : > spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand > brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where > dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = > item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by > dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg > desc,brand_id limit 100").show(); > When i see the event logs or the history server i don't see the query > anywhere, but the query plan is there, so it becomes difficult to trace back > what query actually got submitted. (if have to map it to the specific > application Id on yarn) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org