Ugh, that's pretty evil. I guess we'll have to look into this but I don't know 
off the top of my head.

Matei

On Apr 17, 2012, at 9:23 PM, Richard Xia wrote:

> Hi Matei,
> 
> You're right, I do have boto installed and the Mesos scripts are picking that 
> version up instead of the packaged ones. Apparently the Python convention is 
> to search for modules in PYTHONPATH *after* searching in site-packages, so 
> even though the Mesos-packaged boto is included in PYTHONPATH, my boto 
> installation is loaded first. I'm not that experienced with Python load 
> paths, so is there an easy way to fix this without uninstalling boto?
> 
> Thanks,
> Richard
> 
> On 4/17/12 8:17 AM, Matei Zaharia wrote:
>> Hi Richard,
>> 
>> Do you have boto (the EC2 library for Python) installed on your machine 
>> through easy_install by any chance? It sounds like your Python is finding a 
>> different version of it than the one we ship with Mesos, because I run these 
>> scripts very often and I certainly never get the group.name vs group.id 
>> thing.
>> 
>> For the initial timeout, I agree that we should make it longer. You can also 
>> use launch --resume to resume installation on a cluster where launch failed 
>> for this reason by the way.
>> 
>> Matei
>> 
>> On Apr 16, 2012, at 11:48 AM, Richard Xia wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to go through the guide here 
>>> (https://github.com/mesos/mesos/wiki/EC2-Scripts) and I'm running into a 
>>> couple problems. I'm running the latest version of the trunk (r1310658) on 
>>> Mac OS X 10.6 with the default Python (2.6).
>>> 
>>> The first problem that I run into is with the launch script. The default 
>>> wait time of 60 seconds doesn't seem to be enough; I would consistently run 
>>> into the error of the ssh connection being refused. When I set the wait 
>>> time to 120 seconds (just to be safe, I'm sure a smaller value would work 
>>> as well), it worked and would run to completion. I was just using the 
>>> default settings suggested by the guide (1 slave, m1.large instance) and it 
>>> took me a while to realize that the script just wasn't waiting long enough 
>>> for the instances to start up. Is this the expected behavior? If it is, I 
>>> think the guide needs to be updated to mention that the default wait time 
>>> may not be long enough.
>>> 
>>> The second problem I am having is with any of the scripts that target an 
>>> existing cluster. For example, if I try running ./mesos-ec2 
>>> stop<cluster-name>, I get the error message "ERROR: Could not find any 
>>> existing cluster". When debugging the script, I found that 
>>> get_existing_cluster() wasn't working properly. On line 309, when it sets 
>>> the variable group_names, it calls g.id where g is a security group. The 
>>> following lines seem to check whether the security group name matches 
>>> "<cluster-name>-master", "-slaves", or "-zoo". However, when running a 
>>> debugger, I find that the security group's id is actually in the form " 
>>> sg-6561c10d", not "<cluster-name>-slaves". Instead, it seems to me that 
>>> line 309 should be group_names = [g.name for g in res.groups]. When I make 
>>> this change myself, it seems to work.
>>> 
>>> Thanks,
>>> Richard Xia
> 

Reply via email to