[ 
https://issues.apache.org/jira/browse/SPARK-34344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpan Bhandari updated SPARK-34344:
-----------------------------------
    Description: 
We need to have Application Id from resource manager mapped to the specific 
spark sql query that got executed with respect to that application Id so that 
back tracing is possible.

For example : if i run a query using spark shell : 

spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand 
brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where 
dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = 
item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by 
dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id 
limit 100").show();

When  i see the event logs or the history server i don't see the query 
anywhere, but the query plan is there, so it becomes difficult to trace back 
what query actually got submitted. (if have to map it to the specific 
application Id on yarn)

  was:
We need to have Application Id from resource manager mapped to the specific 
spark sql query that got executed with respect to that application Id so that 
back tracing is possible.

For example : if i run a query using spark shell : 

spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand 
brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where 
dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = 
item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by 
dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg desc,brand_id 
limit 100").show();

When  i see the event logs or the history server i don't see the query 
anywhere, but the query plan is there, so it becomes difficult to trace back 
what query actually got submitted.


> Have functionality to trace back Spark SQL queries from the application ID 
> that got submitted on YARN
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34344
>                 URL: https://issues.apache.org/jira/browse/SPARK-34344
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Shell, Spark Submit
>    Affects Versions: 1.6.3, 2.3.0, 2.4.5
>            Reporter: Arpan Bhandari
>            Priority: Major
>
> We need to have Application Id from resource manager mapped to the specific 
> spark sql query that got executed with respect to that application Id so that 
> back tracing is possible.
> For example : if i run a query using spark shell : 
> spark.sql("select dt.d_year,item.i_brand_id brand_id,item.i_brand 
> brand,sum(ss_ext_sales_price) sum_agg from date_dim dt,store_sales,item where 
> dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = 
> item.i_item_sk and item.i_manufact_id = 436 and dt.d_moy=12 group by 
> dt.d_year,item.i_brand,item.i_brand_id order by dt.d_year,sum_agg 
> desc,brand_id limit 100").show();
> When  i see the event logs or the history server i don't see the query 
> anywhere, but the query plan is there, so it becomes difficult to trace back 
> what query actually got submitted. (if have to map it to the specific 
> application Id on yarn)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to