[
https://issues.apache.org/jira/browse/SPARK-54921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051942#comment-18051942
]
Dipanshu Pandey commented on SPARK-54921:
-----------------------------------------
Hi [~warrenzhu25] ,
I've started working on this issue and have raised a PR:
https://github.com/apache/spark/pull/53803
Implementation approach:
- Added parentIds: collection.Seq[Int] field to StageData class
- Updated protobuf schema with repeated int64 parent_ids field
- Populated from existing StageInfo.parentIds which already tracks parent
stage dependencies
- Added serialization/deserialization support for persistence
- Added unit tests including edge case for empty parentIds (root stages)
The parent stage information was already available internally in the
scheduler (StageInfo.parentIds), so this change simply exposes it through the
REST API.
Could you please assign this issue to me?
> Add parentIds in StageData
> --------------------------
>
> Key: SPARK-54921
> URL: https://issues.apache.org/jira/browse/SPARK-54921
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 4.1.0
> Reporter: Zhongwei Zhu
> Priority: Major
> Labels: pull-request-available
>
> The `parentIds` field is necessary for tracking stage dependencies in the
> Spark UI and other status-related tools.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]