[
https://issues.apache.org/jira/browse/SPARK-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-4679:
-----------------------------------
Component/s: Web UI
> Race condition in querying the Spark UI JSON endpoint when Jetty context
> handlers are added and removed
> -------------------------------------------------------------------------------------------------------
>
> Key: SPARK-4679
> URL: https://issues.apache.org/jira/browse/SPARK-4679
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 1.0.2
> Reporter: Matt Cheah
>
> We started seeing some strange behavior when we were querying the Spark UI
> JSON endpoint for job metadata.
> When the Spark cluster was under heavy load from a large number of
> short-lived spark contexts being created and stopped, querying the JSON
> endpoint (e.g. http://localhost:8080/json) returned the HTML webpage instead.
> We were relying on this JSON data to get information about running jobs on
> our own server and the result was a JSON Parse Exception.
> I dug into the code and realized that this is caused by a race condition
> between how we add and remove Jetty context handlers on the Akka message
> queue thread and how the context handler is looked up on a different thread
> when the HTTP request is fired. Whenever an application is started or
> completes, we invoke ContextHandlerCollection.setHandlers() adding or
> removing a new Jetty handler to the collection. However, setHandlers() first
> sets its internal collection to null before configuring the new passed-in
> collection. If an HTTP request is made and the Jetty context handler is
> looked up AFTER the collection's internal map is set to null, but BEFORE it
> has configured the new collection, the default handler is selected to return
> HTML.
> tl;dr we're using Jetty's ContextHandlerCollection in a way that is not
> thread-safe. The issue we found is only one possible ramification of this;
> I'm not sure what other consequences a non-thread-safe usage of Jetty may
> have. I could only reproduce this by manually stepping through Spark's code
> with a debugger to force the race condition described above, however this
> caused some pain in production when it manifested itself repeatedly and
> reliably.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]