[ https://issues.apache.org/jira/browse/SPARK-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aaron Davidson resolved SPARK-1104. ----------------------------------- Resolution: Fixed Fix Version/s: 1.0.0 > Worker should not block while killing executors > ----------------------------------------------- > > Key: SPARK-1104 > URL: https://issues.apache.org/jira/browse/SPARK-1104 > Project: Spark > Issue Type: Bug > Components: Deploy > Affects Versions: 0.9.0, 1.0.0 > Reporter: Patrick Cogan > Assignee: Nan Zhu > Fix For: 1.0.0 > > > Sometimes due to large shuffles executors will take a long time shutting > down. In particular this can happen if large numbers of shuffle files are > around (this will be alleviated by SPARK-1103, but nonetheless...). > The symptom is you have DEAD workers sitting around in the UI and the > existing workers keep trying to re-register but can't because they've been > assumed dead. > If killing the executor happens in its own thread, or if the ExecutorRunner > were an actor, this would not be a problem. For 0.9 I'd prefer the former > approach since it minimizes code changes. -- This message was sent by Atlassian JIRA (v6.2#6252)