Hi Richard, Do you have boto (the EC2 library for Python) installed on your machine through easy_install by any chance? It sounds like your Python is finding a different version of it than the one we ship with Mesos, because I run these scripts very often and I certainly never get the group.name vs group.id thing.
For the initial timeout, I agree that we should make it longer. You can also use launch --resume to resume installation on a cluster where launch failed for this reason by the way. Matei On Apr 16, 2012, at 11:48 AM, Richard Xia wrote: > Hi, > > I'm trying to go through the guide here > (https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a > couple problems. I'm running the latest version of the trunk (r1310658) on > Mac OS X 10.6 with the default Python (2.6). > > The first problem that I run into is with the launch script. The default wait > time of 60 seconds doesn't seem to be enough; I would consistently run into > the error of the ssh connection being refused. When I set the wait time to > 120 seconds (just to be safe, I'm sure a smaller value would work as well), > it worked and would run to completion. I was just using the default settings > suggested by the guide (1 slave, m1.large instance) and it took me a while to > realize that the script just wasn't waiting long enough for the instances to > start up. Is this the expected behavior? If it is, I think the guide needs to > be updated to mention that the default wait time may not be long enough. > > The second problem I am having is with any of the scripts that target an > existing cluster. For example, if I try running ./mesos-ec2 stop > <cluster-name>, I get the error message "ERROR: Could not find any existing > cluster". When debugging the script, I found that get_existing_cluster() > wasn't working properly. On line 309, when it sets the variable group_names, > it calls g.id where g is a security group. The following lines seem to check > whether the security group name matches "<cluster-name>-master", "-slaves", > or "-zoo". However, when running a debugger, I find that the security group's > id is actually in the form " sg-6561c10d", not "<cluster-name>-slaves". > Instead, it seems to me that line 309 should be group_names = [g.name for g > in res.groups]. When I make this change myself, it seems to work. > > Thanks, > Richard Xia
