Hi Richard,

Do you have boto (the EC2 library for Python) installed on your machine through 
easy_install by any chance? It sounds like your Python is finding a different 
version of it than the one we ship with Mesos, because I run these scripts very 
often and I certainly never get the group.name vs group.id thing.

For the initial timeout, I agree that we should make it longer. You can also 
use launch --resume to resume installation on a cluster where launch failed for 
this reason by the way.

Matei

On Apr 16, 2012, at 11:48 AM, Richard Xia wrote:

> Hi,
> 
> I'm trying to go through the guide here 
> (https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a 
> couple problems. I'm running the latest version of the trunk (r1310658) on 
> Mac OS X 10.6 with the default Python (2.6).
> 
> The first problem that I run into is with the launch script. The default wait 
> time of 60 seconds doesn't seem to be enough; I would consistently run into 
> the error of the ssh connection being refused. When I set the wait time to 
> 120 seconds (just to be safe, I'm sure a smaller value would work as well), 
> it worked and would run to completion. I was just using the default settings 
> suggested by the guide (1 slave, m1.large instance) and it took me a while to 
> realize that the script just wasn't waiting long enough for the instances to 
> start up. Is this the expected behavior? If it is, I think the guide needs to 
> be updated to mention that the default wait time may not be long enough.
> 
> The second problem I am having is with any of the scripts that target an 
> existing cluster. For example, if I try running ./mesos-ec2 stop 
> <cluster-name>, I get the error message "ERROR: Could not find any existing 
> cluster". When debugging the script, I found that get_existing_cluster() 
> wasn't working properly. On line 309, when it sets the variable group_names, 
> it calls g.id where g is a security group. The following lines seem to check 
> whether the security group name matches "<cluster-name>-master", "-slaves", 
> or "-zoo". However, when running a debugger, I find that the security group's 
> id is actually in the form " sg-6561c10d", not "<cluster-name>-slaves". 
> Instead, it seems to me that line 309 should be group_names = [g.name for g 
> in res.groups]. When I make this change myself, it seems to work.
> 
> Thanks,
> Richard Xia

Reply via email to