Nicholas Chammas created SPARK-6246:
---------------------------------------
Summary: spark-ec2 can't handle clusters with > 100 nodes
Key: SPARK-6246
URL: https://issues.apache.org/jira/browse/SPARK-6246
Project: Spark
Issue Type: Bug
Components: EC2
Affects Versions: 1.3.0
Reporter: Nicholas Chammas
Priority: Minor
This appears to be a new restriction, perhaps resulting from our upgrade of
boto. Maybe it's a new restriction from EC2. Not sure yet.
We didn't have this issue around the Spark 1.1.0 time frame from what I can
remember. I'll track down where the issue is and when it started.
Attempting to launch a cluster with 100 slaves yields the following:
{code}
Spark AMI: ami-35b1885c
Launching instances...
Launched 100 slaves in us-east-1c, regid = r-9c408776
Launched master in us-east-1c, regid = r-92408778
Waiting for AWS to propagate instance metadata...
Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
ERROR:boto:<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidRequest</Code><Message>101 exceeds the
maximum number of instance IDs that can be specificied (100). Please specify
fewer than 100 instance
IDs.</Message></Error></Errors><RequestID>217fd6ff-9afa-4e91-86bc-ab16fcc442d8</RequestID></Response>
Traceback (most recent call last):
File "./ec2/spark_ec2.py", line 1338, in <module>
main()
File "./ec2/spark_ec2.py", line 1330, in main
real_main()
File "./ec2/spark_ec2.py", line 1170, in real_main
cluster_state='ssh-ready'
File "./ec2/spark_ec2.py", line 795, in wait_for_cluster_state
statuses = conn.get_all_instance_status(instance_ids=[i.id for i in
cluster_instances])
File "/path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py", line
737, in get_all_instance_status
InstanceStatusSet, verb='POST')
File "/path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py", line 1204,
in get_object
raise self.ResponseError(response.status, response.reason, body)
boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InvalidRequest</Code><Message>101 exceeds the
maximum number of instance IDs that can be specificied (100). Please specify
fewer than 100 instance
IDs.</Message></Error></Errors><RequestID>217fd6ff-9afa-4e91-86bc-ab16fcc442d8</RequestID></Response>
{code}
This problem seems to be with {{get_all_instance_status()}}, though I am not
sure if other methods are affected too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]