[
https://issues.apache.org/jira/browse/SPARK-47253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
TakawaAkirayo updated SPARK-47253:
----------------------------------
Summary: Allow LiveEventBus to stop without the completely draining of
event queue (was: Allow LiveEventBus to stop without the completly draining of
event queue)
> Allow LiveEventBus to stop without the completely draining of event queue
> -------------------------------------------------------------------------
>
> Key: SPARK-47253
> URL: https://issues.apache.org/jira/browse/SPARK-47253
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.5.0
> Reporter: TakawaAkirayo
> Priority: Minor
>
> #Problem statement:
> The SparkContext.stop() hung a long time on LiveEventBus.stop() when
> listeners slow
> #User scenarios:
> We have a centralized service with multiple instances to regularly execute
> user's scheduled tasks.
> For each user task within one service instance, the process is as follows:
> 1.Create a Spark session directly within the service process with an account
> defined in the task.
> 2.Instantiate listeners by class names and register them with the
> SparkContext. The JARs containing the listener classes are uploaded to the
> service by the user.
> 3.Prepare resources.
> 4.Run user logic (Spark SQL).
> 5.Stop the Spark session by invoking SparkSession.stop().
> In step 5, it will wait for the LiveEventBus to stop, which requires the
> remaining events to be completely drained by each listener.
> Since the listener is implemented by users and we cannot prevent some heavy
> stuffs within the listener on each event, there are cases where a single
> heavy job has over 30,000 tasks,
> and it could take 30 minutes for the listener to process all the remaining
> events, because within the listener, it requires a coarse-grained global lock
> and update the internal status to the remote database.
> This kind of delay affects other user tasks in the queue. Therefore, from the
> server side perspective, we need the guarantee that the stop operation
> finishes quickly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]