[
https://issues.apache.org/jira/browse/SPARK-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weiqing Yang resolved SPARK-15857.
----------------------------------
Resolution: Fixed
> Add Caller Context in Spark
> ---------------------------
>
> Key: SPARK-15857
> URL: https://issues.apache.org/jira/browse/SPARK-15857
> Project: Spark
> Issue Type: New Feature
> Reporter: Weiqing Yang
>
> Hadoop has implemented a feature of log tracing – caller context (Jira:
> HDFS-9184 and YARN-4349). The motivation is to better diagnose and understand
> how specific applications impacting parts of the Hadoop system and potential
> problems they may be creating (e.g. overloading NN). As HDFS mentioned in
> HDFS-9184, for a given HDFS operation, it's very helpful to track which upper
> level job issues it. The upper level callers may be specific Oozie tasks, MR
> jobs, hive queries, Spark jobs.
> Hadoop ecosystems like MapReduce, Tez (TEZ-2851), Hive (HIVE-12249,
> HIVE-12254) and Pig(PIG-4714) have implemented their caller contexts. Those
> systems invoke HDFS client API and Yarn client API to setup caller context,
> and also expose an API to pass in caller context into it.
> Lots of Spark applications are running on Yarn/HDFS. Spark can also implement
> its caller context via invoking HDFS/Yarn API, and also expose an API to its
> upstream applications to set up their caller contexts. In the end, the spark
> caller context written into Yarn log / HDFS log can associate with task id,
> stage id, job id and app id. That is also very good for Spark users to
> identify tasks especially if Spark supports multi-tenant environment in the
> future.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]