[ https://issues.apache.org/jira/browse/SPARK-47794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
pengfei zhao updated SPARK-47794: --------------------------------- Description: While running Spark streaming applications on YARN in cluster mode, reboot/shutdown of the node hosting AM causes the application to terminate SparkContext and mark it as SUCCEEDED。 The reboot/shutdown command will send an graceful stop signal of "kill -15" to all processes on this machine. From the Spark Streaming code, this end signal will make the Spark Streaming application think it has ended normally. The log is as follows: !image-2024-04-10-16-03-29-393.png! But in most cases, the reboot/shutdown command may occur due to misoperation, other services needing to be restarted, or the operating system itself needing to be restarted. Is it appropriate for Spark to report such a "SUCCEEDED" status? Especially now, many scheduling systems decide whether to restart the Spark task based on its status, such as the "Failed" status. Spark streaming reports such a "SUCCEEDED" status, making it difficult for the scheduling system to handle. Moreover, Spark streaming runs for a long time, and reporting the "SUCCEEDED" status is also very ambiguous. was: While running Spark streaming applications on YARN in cluster mode, reboot/shutdown of the node hosting AM causes the application to terminate SparkContext and mark it as SUCCEEDED。 The reboot/shutdown command will send an graceful stop signal of "kill -15" to all processes on this machine. From the Spark Streaming code, this end signal will make the Spark Streaming application think it has ended normally. The log is as follows: !image-2024-04-10-16-00-22-450.png! But in most cases, the reboot/shutdown command may occur due to misoperation, other services needing to be restarted, or the operating system itself needing to be restarted. Is it appropriate for Spark to report such a "SUCCEEDED" status? Especially now, many scheduling systems decide whether to restart the Spark task based on its status, such as the "Failed" status. Spark streaming reports such a "SUCCEEDED" status, making it difficult for the scheduling system to handle. Moreover, Spark streaming runs for a long time, and reporting the "SUCCEEDED" status is also very ambiguous. > After executing the reboot command on the host where the Driver node is > located, the spark streaming application ends in a SUCCESSED state > ------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-47794 > URL: https://issues.apache.org/jira/browse/SPARK-47794 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.8, 3.3.4 > Reporter: pengfei zhao > Priority: Major > Attachments: image-2024-04-10-16-03-29-393.png > > > While running Spark streaming applications on YARN in cluster mode, > reboot/shutdown of the node hosting AM causes the application to terminate > SparkContext and mark it as SUCCEEDED。 > The reboot/shutdown command will send an graceful stop signal of "kill -15" > to all processes on this machine. From the Spark Streaming code, this end > signal will make the Spark Streaming application think it has ended normally. > The log is as follows: > !image-2024-04-10-16-03-29-393.png! > But in most cases, the reboot/shutdown command may occur due to misoperation, > other services needing to be restarted, or the operating system itself > needing to be restarted. Is it appropriate for Spark to report such a > "SUCCEEDED" status? > Especially now, many scheduling systems decide whether to restart the Spark > task based on its status, such as the "Failed" status. Spark streaming > reports such a "SUCCEEDED" status, making it difficult for the scheduling > system to handle. > Moreover, Spark streaming runs for a long time, and reporting the "SUCCEEDED" > status is also very ambiguous. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org