Re: Having problems with the EC2 Python scripts

Richard Xia Tue, 17 Apr 2012 13:24:16 -0700

Hi Matei,

You're right, I do have boto installed and the Mesos scripts are pickingthat version up instead of the packaged ones. Apparently the Pythonconvention is to search for modules in PYTHONPATH *after* searching insite-packages, so even though the Mesos-packaged boto is included inPYTHONPATH, my boto installation is loaded first. I'm not thatexperienced with Python load paths, so is there an easy way to fix thiswithout uninstalling boto?


Thanks,
Richard

On 4/17/12 8:17 AM, Matei Zaharia wrote:

Hi Richard,

Do you have boto (the EC2 library for Python) installed on your machine through 
easy_install by any chance? It sounds like your Python is finding a different 
version of it than the one we ship with Mesos, because I run these scripts very 
often and I certainly never get the group.name vs group.id thing.

For the initial timeout, I agree that we should make it longer. You can also 
use launch --resume to resume installation on a cluster where launch failed for 
this reason by the way.

Matei

On Apr 16, 2012, at 11:48 AM, Richard Xia wrote:

Hi,

I'm trying to go through the guide here
(https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a couple
problems. I'm running the latest version of the trunk (r1310658) on Mac OS X
10.6 with the default Python (2.6).

The first problem that I run into is with the launch script. The default wait
time of 60 seconds doesn't seem to be enough; I would consistently run into the
error of the ssh connection being refused. When I set the wait time to 120
seconds (just to be safe, I'm sure a smaller value would work as well), it
worked and would run to completion. I was just using the default settings
suggested by the guide (1 slave, m1.large instance) and it took me a while to
realize that the script just wasn't waiting long enough for the instances to
start up. Is this the expected behavior? If it is, I think the guide needs to
be updated to mention that the default wait time may not be long enough.

The second problem I am having is with any of the scripts that target an existing cluster. For example, if I try running ./mesos-ec2
stop<cluster-name>, I get the error message "ERROR: Could not find any existing cluster". When debugging the script, I found that
get_existing_cluster() wasn't working properly. On line 309, when it sets the variable group_names, it calls g.id where g is a security group. The following
lines seem to check whether the security group name matches "<cluster-name>-master", "-slaves", or "-zoo". However, when
running a debugger, I find that the security group's id is actually in the form " sg-6561c10d", not "<cluster-name>-slaves".
Instead, it seems to me that line 309 should be group_names = [g.name for g in res.groups]. When I make this change myself, it seems to work.

Thanks,
Richard Xia

Re: Having problems with the EC2 Python scripts

Reply via email to