Ugh, that's pretty evil. I guess we'll have to look into this but I don't know off the top of my head.
Matei On Apr 17, 2012, at 9:23 PM, Richard Xia wrote: > Hi Matei, > > You're right, I do have boto installed and the Mesos scripts are picking that > version up instead of the packaged ones. Apparently the Python convention is > to search for modules in PYTHONPATH *after* searching in site-packages, so > even though the Mesos-packaged boto is included in PYTHONPATH, my boto > installation is loaded first. I'm not that experienced with Python load > paths, so is there an easy way to fix this without uninstalling boto? > > Thanks, > Richard > > On 4/17/12 8:17 AM, Matei Zaharia wrote: >> Hi Richard, >> >> Do you have boto (the EC2 library for Python) installed on your machine >> through easy_install by any chance? It sounds like your Python is finding a >> different version of it than the one we ship with Mesos, because I run these >> scripts very often and I certainly never get the group.name vs group.id >> thing. >> >> For the initial timeout, I agree that we should make it longer. You can also >> use launch --resume to resume installation on a cluster where launch failed >> for this reason by the way. >> >> Matei >> >> On Apr 16, 2012, at 11:48 AM, Richard Xia wrote: >> >>> Hi, >>> >>> I'm trying to go through the guide here >>> (https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a >>> couple problems. I'm running the latest version of the trunk (r1310658) on >>> Mac OS X 10.6 with the default Python (2.6). >>> >>> The first problem that I run into is with the launch script. The default >>> wait time of 60 seconds doesn't seem to be enough; I would consistently run >>> into the error of the ssh connection being refused. When I set the wait >>> time to 120 seconds (just to be safe, I'm sure a smaller value would work >>> as well), it worked and would run to completion. I was just using the >>> default settings suggested by the guide (1 slave, m1.large instance) and it >>> took me a while to realize that the script just wasn't waiting long enough >>> for the instances to start up. Is this the expected behavior? If it is, I >>> think the guide needs to be updated to mention that the default wait time >>> may not be long enough. >>> >>> The second problem I am having is with any of the scripts that target an >>> existing cluster. For example, if I try running ./mesos-ec2 >>> stop<cluster-name>, I get the error message "ERROR: Could not find any >>> existing cluster". When debugging the script, I found that >>> get_existing_cluster() wasn't working properly. On line 309, when it sets >>> the variable group_names, it calls g.id where g is a security group. The >>> following lines seem to check whether the security group name matches >>> "<cluster-name>-master", "-slaves", or "-zoo". However, when running a >>> debugger, I find that the security group's id is actually in the form " >>> sg-6561c10d", not "<cluster-name>-slaves". Instead, it seems to me that >>> line 309 should be group_names = [g.name for g in res.groups]. When I make >>> this change myself, it seems to work. >>> >>> Thanks, >>> Richard Xia >
