[ https://issues.apache.org/jira/browse/SPARK-44438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anish Shrigondekar updated SPARK-44438: --------------------------------------- Description: Shutdown maintenance task thread on error Currently if we get an error on maintenance, the future is set to false, but the thread is actually never reaped. If this happens often, we end up accumulating the state store threads that don't have any reference. Eventually this could lead to thread exhaustion. Eg logs - {code:java} log4j-2023-07-13-18.log:6968:23/07/13 18:34:25 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:282671:23/07/13 18:43:28 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:322145:23/07/13 18:44:32 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:361192:23/07/13 18:45:37 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:398145:23/07/13 18:46:37 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:436591:23/07/13 18:47:42 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:478531:23/07/13 18:48:47 INFO StateStore: State Store maintenance task started {code} was: Shutdown maintenance task thread on error Currently if we get an error on maintenance, the future is set to false, but the thread is actually never reaped. If this happens often, we end up accumulating the state store threads that don't have any reference. Eventually this could lead to thread exhaustion. Eg logs - ``` log4j-2023-07-13-18.log:6968:23/07/13 18:34:25 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:282671:23/07/13 18:43:28 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:322145:23/07/13 18:44:32 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:361192:23/07/13 18:45:37 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:398145:23/07/13 18:46:37 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:436591:23/07/13 18:47:42 INFO StateStore: State Store maintenance task started log4j-2023-07-13-18.log:478531:23/07/13 18:48:47 INFO StateStore: State Store maintenance task started ``` > Shutdown maintenance task thread on error > ----------------------------------------- > > Key: SPARK-44438 > URL: https://issues.apache.org/jira/browse/SPARK-44438 > Project: Spark > Issue Type: Task > Components: Structured Streaming > Affects Versions: 3.5.0 > Reporter: Anish Shrigondekar > Priority: Major > > Shutdown maintenance task thread on error > > Currently if we get an error on maintenance, the future is set to false, but > the thread is actually never reaped. If this happens often, we end up > accumulating the state store threads that don't have any reference. > Eventually this could lead to thread exhaustion. > > Eg logs - > {code:java} > log4j-2023-07-13-18.log:6968:23/07/13 18:34:25 INFO StateStore: State Store > maintenance task started > log4j-2023-07-13-18.log:282671:23/07/13 18:43:28 INFO StateStore: State Store > maintenance task started > log4j-2023-07-13-18.log:322145:23/07/13 18:44:32 INFO StateStore: State Store > maintenance task started > log4j-2023-07-13-18.log:361192:23/07/13 18:45:37 INFO StateStore: State Store > maintenance task started > log4j-2023-07-13-18.log:398145:23/07/13 18:46:37 INFO StateStore: State Store > maintenance task started > log4j-2023-07-13-18.log:436591:23/07/13 18:47:42 INFO StateStore: State Store > maintenance task started > log4j-2023-07-13-18.log:478531:23/07/13 18:48:47 INFO StateStore: State Store > maintenance task started {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org