Hi,
I'm trying to go through the guide here
(https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a
couple problems. I'm running the latest version of the trunk (r1310658)
on Mac OS X 10.6 with the default Python (2.6).
The first problem that I run into is with the launch script. The default
wait time of 60 seconds doesn't seem to be enough; I would consistently
run into the error of the ssh connection being refused. When I set the
wait time to 120 seconds (just to be safe, I'm sure a smaller value
would work as well), it worked and would run to completion. I was just
using the default settings suggested by the guide (1 slave, m1.large
instance) and it took me a while to realize that the script just wasn't
waiting long enough for the instances to start up. Is this the expected
behavior? If it is, I think the guide needs to be updated to mention
that the default wait time may not be long enough.
The second problem I am having is with any of the scripts that target an
existing cluster. For example, if I try running ./mesos-ec2 stop
<cluster-name>, I get the error message "ERROR: Could not find any
existing cluster". When debugging the script, I found that
get_existing_cluster() wasn't working properly. On line 309, when it
sets the variable group_names, it calls g.id where g is a security
group. The following lines seem to check whether the security group name
matches "<cluster-name>-master", "-slaves", or "-zoo". However, when
running a debugger, I find that the security group's id is actually in
the form " sg-6561c10d", not "<cluster-name>-slaves". Instead, it seems
to me that line 309 should be group_names = [g.name for g in
res.groups]. When I make this change myself, it seems to work.
Thanks,
Richard Xia
- Having problems with the EC2 Python scripts Richard Xia
-