Hi,

thanks for the follow up. You are right regarding the invalidation of
observation #2. I later realized the Worker UI page directly displays the
entries in the executors map and can see in our production UI it's in a
proper state.

As for the Killed vs Exited, it's less relevant now since the theory about
the executors map is invalid. However to answer your question, the current
setup is that the SparkContext lifecycle encapsulates exactly one
application. That is we create a single context per application submitted
and close it upon success/failure completion of the application.

Thanks,

On Mon, Jul 20, 2015 at 3:20 PM, Josh Rosen <joshro...@databricks.com>
wrote:

> Hi Richard,
>
> Thanks for your detailed investigation of this issue.  I agree with your
> observation that the finishedExecutors hashmap is a source of memory leaks
> for very-long-lived clusters.  It looks like the finishedExecutors map is
> only read when rendering the Worker Web UI and in constructing REST API
> responses.  I think that we could address this leak by adding a
> configuration to cap the maximum number of retained executors,
> applications, etc.  We already have similar caps in the driver UI.  If we
> add this configuration, I think that we should pick some sensible default
> value rather than an unlimited one.  This is technically a user-facing
> behavior change but I think it's okay since the current behavior is to
> crash / OOM.
>
> Regarding `KillExecutor`, I think that there might be some asynchrony and
> indirection masking the cleanup here.  Based on a quick glance through the
> code, it looks like ExecutorRunner's thread will end an
> ExecutorStateChanged RPC back to the Worker after the executor is killed,
> so I think that the cleanup will be triggered by that RPC.  Since this
> isn't clear from reading the code, though, it would be great to add some
> comments to the code to explain this, plus a unit test to make sure that
> this indirect cleanup mechanism isn't broken in the future.
>
> I'm not sure what's causing the Killed vs Exited issue, but I have one
> theory: does the behavior vary based on whether your application cleanly
> shuts down the SparkContext via SparkContext.stop()? It's possible that
> omitting the stop() could lead to a "Killed" exit status, but I don't know
> for sure.  (This could probably also be clarified with a unit test).
>
> To my knowledge, the spark-perf suite does not contain the sort of
> scale-testing workload that would expose these types of memory leaks; we
> have some tests for very long-lived individual applications, but not tests
> for long-lived clusters that run thousands of applications between
> restarts.  I'm going to create some tickets to add such tests.
>
> I've filed https://issues.apache.org/jira/browse/SPARK-9202 to follow up
> on the finishedExecutors leak.
>
> - Josh
>
> On Mon, Jul 20, 2015 at 9:56 AM, Richard Marscher <
> rmarsc...@localytics.com> wrote:
>
>> Hi,
>>
>> we have been experiencing issues in production over the past couple weeks
>> with Spark Standalone Worker JVMs seeming to have memory leaks. They
>> accumulate Old Gen until it reaches max and then reach a failed state that
>> starts critically failing some applications running against the cluster.
>>
>> I've done some exploration of the Spark code base related to Worker in
>> search of potential sources of this problem and am looking for some
>> commentary on a couple theories I have:
>>
>> Observation 1: The `finishedExecutors` HashMap seem to only accumulate
>> new entries over time unbounded. It only seems to be appended and never
>> periodically purged or cleaned of older executors in line with something
>> like the worker cleanup scheduler.
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala#L473
>>
>> I feel somewhat confident that over time this will exhibit a "leak". I
>> quote it just because it may be intentional to hold these references to
>> support functionality versus a true leak where you just accidentally hold
>> onto memory.
>>
>> Observation 2: I feel much less certain about this, but it seemed like if
>> the Worker is messaged with `KillExecutor` then it only kills the `
>> ExecutorRunner` but does not clean it up from the executor map.
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala#L492
>>
>> I haven't been able to sort out whether I'm missing something indirect
>> where it before/after cleans that executor from the map. However, if it
>> does not, then it may be leaking references on this map.
>>
>> One final observation related to our production metrics and not the
>> codebase itself. We used to periodically see that our completed
>> applications had the status of "Killed" instead of "Exited" for all the
>> executors. However, now we see every completed application has a final
>> state of "Killed" for all the executors. I might speculatively correlate
>> this to Observation 2 as a potential reason we have started seeing this
>> issue more recently.
>>
>> We also have a larger and increasing workload over the past few weeks and
>> possibly code changes to the application description that could be
>> exacerbating these potential underlying issues. We run a lot of smaller
>> applications per day, something in the range of hundreds to maybe 1000
>> applications per day with 16 executors per application.
>>
>> Thanks
>> --
>> *Richard Marscher*
>> Software Engineer
>> Localytics
>> Localytics.com <http://localytics.com/> | Our Blog
>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
>> Facebook <http://facebook.com/localytics> | LinkedIn
>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>
>
>


-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>

Reply via email to