[
https://issues.apache.org/jira/browse/SPARK-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Wendell updated SPARK-874:
----------------------------------
Description:
When running benchmarking jobs, sometimes the cluster takes a long time to shut
down. We should add a feature where it will ssh into all the workers every few
seconds and check that the processes are dead, and won't return until they are
all dead. This would help a lot with automating benchmarking scripts.
There is some equivalent logic here written in python, we just need to add it
to the shell script:
https://github.com/pwendell/spark-perf/blob/master/bin/run#L117
was:When running benchmarking jobs, sometimes the cluster takes a long time
to shut down. We should add a feature where it will ssh into all the workers
every few seconds and check that the processes are dead, and won't return until
they are all dead. This would help a lot with automating benchmarking scripts.
> Have a --wait flag in ./sbin/stop-all.sh that polls until Worker's are
> finished
> -------------------------------------------------------------------------------
>
> Key: SPARK-874
> URL: https://issues.apache.org/jira/browse/SPARK-874
> Project: Spark
> Issue Type: New Feature
> Components: Deploy
> Reporter: Patrick Wendell
> Priority: Minor
> Labels: starter
> Fix For: 1.1.0
>
>
> When running benchmarking jobs, sometimes the cluster takes a long time to
> shut down. We should add a feature where it will ssh into all the workers
> every few seconds and check that the processes are dead, and won't return
> until they are all dead. This would help a lot with automating benchmarking
> scripts.
> There is some equivalent logic here written in python, we just need to add it
> to the shell script:
> https://github.com/pwendell/spark-perf/blob/master/bin/run#L117
--
This message was sent by Atlassian JIRA
(v6.2#6252)