[
https://issues.apache.org/jira/browse/SPARK-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-6188:
-----------------------------
Shepherd: (was: Josh Rosen)
Assignee: Theodore Vasiloudis
> Instance types can be mislabeled when re-starting cluster with default
> arguments
> --------------------------------------------------------------------------------
>
> Key: SPARK-6188
> URL: https://issues.apache.org/jira/browse/SPARK-6188
> Project: Spark
> Issue Type: Bug
> Components: EC2
> Affects Versions: 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1
> Reporter: Theodore Vasiloudis
> Assignee: Theodore Vasiloudis
> Priority: Minor
> Fix For: 1.4.0
>
>
> This was discovered when investigating
> https://issues.apache.org/jira/browse/SPARK-5838.
> In short, when restarting a cluster that you launched with an alternative
> instance type, you have to provide the instance type(s) again in the
> "/spark-ec2 -i <key-file> --region=<ec2-region> start <cluster-name>"
> command. Otherwise it gets set to the default m1.large.
> This then affects the setup of the machines.
> I'll submit a pull request that takes cares of this, without the user needing
> to provide the instance type(s) again.
> EDIT:
> Example case where this becomes a problem:
> 1. User launches a cluster with instances with 1 disk, ex. m3.large.
> 2. The user stops the cluster.
> 3. When the user restarts the cluster with the start command without
> providing the instance type, the setup is performed using the default
> instance type, m1.large, which assumes 2 disks present in the machine.
> 4. The SPARK_LOCAL_DIRS is then set to "mnt/spark,mnt2/spark". /mnt2
> corresponds to the snapshot partition in a m3.large instance, which is only
> 8GB in size. When the user runs jobs that shuffle data, this partition fills
> up quickly, resulting in failed jobs due to "No space left on device" errors.
> Apart from this example one could come up with other examples where the setup
> of the machines is wrong, due to assuming that they are of type m1.large.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]