[
https://issues.apache.org/jira/browse/SPARK-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Theodore Vasiloudis updated SPARK-6188:
---------------------------------------
Description:
This was discovered when investigating
https://issues.apache.org/jira/browse/SPARK-5838.
In short, when restarting a cluster that you launched with an alternative
instance type, you have to provide the instance type(s) again in the
"/spark-ec2 -i <key-file> --region=<ec2-region> start <cluster-name>" command.
Otherwise it gets set to the default m1.large.
This then affects the setup of the machines.
I'll submit a pull request that takes cares of this, without the user needing
to provide the instance type(s) again.
EDIT:
Example case where this becomes a problem:
1. User launches a cluster with instances with 1 disk, ex. m3.large.
2. The user stops the cluster.
3. When the user restarts the cluster with the start command without providing
the instance type, the setup is performed using the default instance type,
m1.large.
4. The SPARK_LOCAL_DIRS is then set to "mnt/spark,mnt2/spark". /mnt2
corresponds to the snapshot partition in a m3.large instance, which is only 8GB
in size. When the user runs jobs that shuffle data, this partition fills up
quickly, resulting in failed jobs due to "No space left on device" errors.
Apart from this example one could come up with other examples where the setup
of the machines is wrong, due to assuming that they are of type m1.large.
was:
This was discovered when investigating
https://issues.apache.org/jira/browse/SPARK-5838.
In short, when restarting a cluster that you launched with an alternative
instance type, you have to provide the instance type(s) again in the
"/spark-ec2 -i <key-file> --region=<ec2-region> start <cluster-name>" command.
Otherwise it gets set to the default m1.large.
This then affects the setup of the machines.
I'll submit a pull request that takes cares of this, without the user needing
to provide the instance type(s) again.
> Instance types can be mislabeled when re-starting cluster with default
> arguments
> --------------------------------------------------------------------------------
>
> Key: SPARK-6188
> URL: https://issues.apache.org/jira/browse/SPARK-6188
> Project: Spark
> Issue Type: Bug
> Components: EC2
> Affects Versions: 1.0.2, 1.1.0, 1.1.1, 1.2.0, 1.2.1
> Reporter: Theodore Vasiloudis
> Priority: Minor
>
> This was discovered when investigating
> https://issues.apache.org/jira/browse/SPARK-5838.
> In short, when restarting a cluster that you launched with an alternative
> instance type, you have to provide the instance type(s) again in the
> "/spark-ec2 -i <key-file> --region=<ec2-region> start <cluster-name>"
> command. Otherwise it gets set to the default m1.large.
> This then affects the setup of the machines.
> I'll submit a pull request that takes cares of this, without the user needing
> to provide the instance type(s) again.
> EDIT:
> Example case where this becomes a problem:
> 1. User launches a cluster with instances with 1 disk, ex. m3.large.
> 2. The user stops the cluster.
> 3. When the user restarts the cluster with the start command without
> providing the instance type, the setup is performed using the default
> instance type, m1.large.
> 4. The SPARK_LOCAL_DIRS is then set to "mnt/spark,mnt2/spark". /mnt2
> corresponds to the snapshot partition in a m3.large instance, which is only
> 8GB in size. When the user runs jobs that shuffle data, this partition fills
> up quickly, resulting in failed jobs due to "No space left on device" errors.
> Apart from this example one could come up with other examples where the setup
> of the machines is wrong, due to assuming that they are of type m1.large.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]