[
https://issues.apache.org/jira/browse/SPARK-37831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475071#comment-17475071
]
Apache Spark commented on SPARK-37831:
--------------------------------------
User 'stczwd' has created a pull request for this issue:
https://github.com/apache/spark/pull/35185
> Add task partition id in metrics
> --------------------------------
>
> Key: SPARK-37831
> URL: https://issues.apache.org/jira/browse/SPARK-37831
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.2.1, 3.3.0
> Reporter: Jackey Lee
> Priority: Major
>
> There is no partition id in current metrics, it makes difficult to trace
> stage metrics, such as stage shuffle read, especially when there are stage
> retries. It is also impossible to check task metrics between different
> applications.
> {code:java}
> class TaskData private[spark](
> val taskId: Long,
> val index: Int,
> val attempt: Int,
> val launchTime: Date,
> val resultFetchStart: Option[Date],
> @JsonDeserialize(contentAs = classOf[JLong])
> val duration: Option[Long],
> val executorId: String,
> val host: String,
> val status: String,
> val taskLocality: String,
> val speculative: Boolean,
> val accumulatorUpdates: Seq[AccumulableInfo],
> val errorMessage: Option[String] = None,
> val taskMetrics: Option[TaskMetrics] = None,
> val executorLogs: Map[String, String],
> val schedulerDelay: Long,
> val gettingResultTime: Long) {code}
> Adding partitionId in Task Data can not only make us easy to trace task
> metrics, also can make it possible to collect metrics for actual stage
> outputs, especially when stage retries.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]