Good point there. Maybe libprocess (our communication layer) is using the wrong 
address. I remember seeing that on ubuntu -- if you try to call gethostbyname 
passing in the local hostname, you get back 127.0.1.1 instead of the external 
IP. Try setting the LIBPROCESS_IP environment variable on the slave to the 
"right" IP before you run mesos-slave.

Matei

On Apr 19, 2012, at 9:47 PM, Scott Smith wrote:

> Well the logs say this:
> 
> I0420 04:40:30.870983  8193 master.cpp:814] Attempting to register
> slave 201204200437-0-162 at [email protected]:51851
> I0420 04:40:30.871330  8193 master.cpp:1057] Master now considering a
> slave at ip-10-252-94-24.us-west-2.compute.internal:51851 as active
> I0420 04:40:30.871415  8193 master.cpp:1588] Adding slave
> 201204200437-0-162 at ip-10-252-94-24.us-west-2.compute.internal with
> cpus=1; mem=1024
> I0420 04:40:30.871599  8193 simple_allocator.cpp:71] Added slave
> 201204200437-0-162 with cpus=1; mem=1024
> I0420 04:40:30.871680  8193 master.cpp:1143] Slave 201204200437-0-162
> disconnected
> I0420 04:40:30.871819  8193 simple_allocator.cpp:83] Removed slave
> 201204200437-0-162
> 
> tcp dump says this:
> 
> POST /master/mesos.internal.RegisterSlaveMessage HTTP/1.0
> User-Agent: libprocess/[email protected]:51851
> Connection: Keep-Alive
> Transfer-Encoding: chunked
> 
> 87
> 
> ..
> *ip-10-252-94-24.us-west-2.compute.internal.*ip-10-252-94-24.us-west-2.compute.internal..
> .cpus...              .......?..
> .mem...               .......@ .?
> 0
> 
> 
> 
> so it looks like its reporting both a valid hostname and a loopback
> addr.  Which will the master use?
> 
> btw I have both machines in the same security group, and opened all
> tcp inbound for the group to the group.
> 
> 
> On Thu, Apr 19, 2012 at 9:42 PM, Matei Zaharia <[email protected]> 
> wrote:
>> What hostname and port does the slave report for itself (i.e. when the 
>> master sees it connect, what message does it print)? It could be that the 
>> master cannot connect back to that address. Maybe you need to open up 
>> communication among machines in your EC2 security groups.
>> 
>> Matei
>> 
>> On Apr 19, 2012, at 9:10 PM, Scott Smith wrote:
>> 
>>> Direct IP/port.  No zookeeper.
>>> On Apr 19, 2012 7:35 PM, "John Sirois" <[email protected]> wrote:
>>> 
>>>> How are your slaves connecting to the master?  Via zookeeper or via known
>>>> hostname/ip ?
>>>> 
>>>> On Thursday, April 19, 2012, Scott Smith wrote:
>>>> 
>>>>> I'm trying to set up a cluster on ec2, but not using the canned
>>>>> scripts/image.  I built the latest svn on Ubuntu 11.10 amd64, and copied
>>>>> the build to a second node.  Both are c1.medium instances (not that it
>>>>> should matter).  No other software is running (no hdfs, no hadoop, etc).
>>>>> 
>>>>> The problem I have is the slave repeatedly (approx once per second)
>>>>> connects, advertises its resources, gets added, and then disconnects.  No
>>>>> reason is given for disconnecting.  There are no messages on the slave,
>>>>> only 5 or 6 messages on the master.
>>>>> 
>>>>> I'm not sure what the next diagnostic step should be; I was hoping
>>>> someone
>>>>> else ran into the same problem and could point out what I did wrong.  Any
>>>>> advice?
>>>>> 
>>>>> Thanks!
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> John Sirois
>>>> 303-512-3301
>>>> 
>> 
> 
> 
> 
> -- 
>         Scott

Reply via email to