[ 
https://issues.apache.org/jira/browse/SPARK-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355403#comment-14355403
 ] 

Shivaram Venkataraman commented on SPARK-6246:
----------------------------------------------

Hmm - This seems like a bad problem. And it looks like a AWS side change rather 
than a boto change I guess.
[~nchammas] Similar to the EC2Box issue above, can we also batch calls to 
`get_instances` 100 instances at a time ?

> spark-ec2 can't handle clusters with > 100 nodes
> ------------------------------------------------
>
>                 Key: SPARK-6246
>                 URL: https://issues.apache.org/jira/browse/SPARK-6246
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 1.3.0
>            Reporter: Nicholas Chammas
>            Priority: Minor
>
> This appears to be a new restriction, perhaps resulting from our upgrade of 
> boto. Maybe it's a new restriction from EC2. Not sure yet.
> We didn't have this issue around the Spark 1.1.0 time frame from what I can 
> remember. I'll track down where the issue is and when it started.
> Attempting to launch a cluster with 100 slaves yields the following:
> {code}
> Spark AMI: ami-35b1885c
> Launching instances...
> Launched 100 slaves in us-east-1c, regid = r-9c408776
> Launched master in us-east-1c, regid = r-92408778
> Waiting for AWS to propagate instance metadata...
> Waiting for cluster to enter 'ssh-ready' state.ERROR:boto:400 Bad Request
> ERROR:boto:<?xml version="1.0" encoding="UTF-8"?>
> <Response><Errors><Error><Code>InvalidRequest</Code><Message>101 exceeds the 
> maximum number of instance IDs that can be specificied (100). Please specify 
> fewer than 100 instance 
> IDs.</Message></Error></Errors><RequestID>217fd6ff-9afa-4e91-86bc-ab16fcc442d8</RequestID></Response>
> Traceback (most recent call last):
>   File "./ec2/spark_ec2.py", line 1338, in <module>
>     main()
>   File "./ec2/spark_ec2.py", line 1330, in main
>     real_main()
>   File "./ec2/spark_ec2.py", line 1170, in real_main
>     cluster_state='ssh-ready'
>   File "./ec2/spark_ec2.py", line 795, in wait_for_cluster_state
>     statuses = conn.get_all_instance_status(instance_ids=[i.id for i in 
> cluster_instances])
>   File "/path/apache/spark/ec2/lib/boto-2.34.0/boto/ec2/connection.py", line 
> 737, in get_all_instance_status
>     InstanceStatusSet, verb='POST')
>   File "/path/apache/spark/ec2/lib/boto-2.34.0/boto/connection.py", line 
> 1204, in get_object
>     raise self.ResponseError(response.status, response.reason, body)
> boto.exception.EC2ResponseError: EC2ResponseError: 400 Bad Request
> <?xml version="1.0" encoding="UTF-8"?>
> <Response><Errors><Error><Code>InvalidRequest</Code><Message>101 exceeds the 
> maximum number of instance IDs that can be specificied (100). Please specify 
> fewer than 100 instance 
> IDs.</Message></Error></Errors><RequestID>217fd6ff-9afa-4e91-86bc-ab16fcc442d8</RequestID></Response>
> {code}
> This problem seems to be with {{get_all_instance_status()}}, though I am not 
> sure if other methods are affected too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to