Weiqing Yang created SPARK-15857:
------------------------------------

             Summary: Add Caller Context in Spark
                 Key: SPARK-15857
                 URL: https://issues.apache.org/jira/browse/SPARK-15857
             Project: Spark
          Issue Type: New Feature
            Reporter: Weiqing Yang


Hadoop has implemented a feature of log tracing – caller context (Jira: 
HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand 
how specific applications impacting parts of the Hadoop system and potential 
problems they may be creating (e.g. overloading NN). As HDFS mentioned in 
HDFS-9184, for a given HDFS operation, it's very helpful to track which upper 
level job issues it. The upper level callers may be specific Oozie tasks, MR 
jobs, hive queries, Spark jobs. 

Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249, HIVE-12254) 
and Pig(PIG-4714) have implemented their caller contexts. Those systems invoke 
HDFS client API and Yarn client API to setup caller context, and also expose an 
API to pass in caller context into it.

Lots of Spark applications are running on Yarn/HDFS. Spark can also implement 
its caller context via invoking HDFS/Yarn API, and also expose an API to its 
upstream applications to set up their caller contexts. In the end, the spark 
caller context written into Yarn log / HDFS log can associate with task id, 
stage id, job id and app id. That is also very good for Spark users to identify 
tasks especially if Spark supports multi-tenant environment in the future.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to