[ https://issues.apache.org/jira/browse/SPARK-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-4679. --------------------------------- Resolution: Incomplete > Race condition in querying the Spark UI JSON endpoint when Jetty context > handlers are added and removed > ------------------------------------------------------------------------------------------------------- > > Key: SPARK-4679 > URL: https://issues.apache.org/jira/browse/SPARK-4679 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 1.0.2 > Reporter: Matt Cheah > Priority: Major > Labels: bulk-closed > > We started seeing some strange behavior when we were querying the Spark UI > JSON endpoint for job metadata. > When the Spark cluster was under heavy load from a large number of > short-lived spark contexts being created and stopped, querying the JSON > endpoint (e.g. http://localhost:8080/json) returned the HTML webpage instead. > We were relying on this JSON data to get information about running jobs on > our own server and the result was a JSON Parse Exception. > I dug into the code and realized that this is caused by a race condition > between how we add and remove Jetty context handlers on the Akka message > queue thread and how the context handler is looked up on a different thread > when the HTTP request is fired. Whenever an application is started or > completes, we invoke ContextHandlerCollection.setHandlers() adding or > removing a new Jetty handler to the collection. However, setHandlers() first > sets its internal collection to null before configuring the new passed-in > collection. If an HTTP request is made and the Jetty context handler is > looked up AFTER the collection's internal map is set to null, but BEFORE it > has configured the new collection, the default handler is selected to return > HTML. > tl;dr we're using Jetty's ContextHandlerCollection in a way that is not > thread-safe. The issue we found is only one possible ramification of this; > I'm not sure what other consequences a non-thread-safe usage of Jetty may > have. I could only reproduce this by manually stepping through Spark's code > with a debugger to force the race condition described above, however this > caused some pain in production when it manifested itself repeatedly and > reliably. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org