Hi,

I'm trying to go through the guide here (https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a couple problems. I'm running the latest version of the trunk (r1310658) on Mac OS X 10.6 with the default Python (2.6).

The first problem that I run into is with the launch script. The default wait time of 60 seconds doesn't seem to be enough; I would consistently run into the error of the ssh connection being refused. When I set the wait time to 120 seconds (just to be safe, I'm sure a smaller value would work as well), it worked and would run to completion. I was just using the default settings suggested by the guide (1 slave, m1.large instance) and it took me a while to realize that the script just wasn't waiting long enough for the instances to start up. Is this the expected behavior? If it is, I think the guide needs to be updated to mention that the default wait time may not be long enough.

The second problem I am having is with any of the scripts that target an existing cluster. For example, if I try running ./mesos-ec2 stop <cluster-name>, I get the error message "ERROR: Could not find any existing cluster". When debugging the script, I found that get_existing_cluster() wasn't working properly. On line 309, when it sets the variable group_names, it calls g.id where g is a security group. The following lines seem to check whether the security group name matches "<cluster-name>-master", "-slaves", or "-zoo". However, when running a debugger, I find that the security group's id is actually in the form " sg-6561c10d", not "<cluster-name>-slaves". Instead, it seems to me that line 309 should be group_names = [g.name for g in res.groups]. When I make this change myself, it seems to work.

Thanks,
Richard Xia

Reply via email to