Well the logs say this:

I0420 04:40:30.870983  8193 master.cpp:814] Attempting to register
slave 201204200437-0-162 at [email protected]:51851
I0420 04:40:30.871330  8193 master.cpp:1057] Master now considering a
slave at ip-10-252-94-24.us-west-2.compute.internal:51851 as active
I0420 04:40:30.871415  8193 master.cpp:1588] Adding slave
201204200437-0-162 at ip-10-252-94-24.us-west-2.compute.internal with
cpus=1; mem=1024
I0420 04:40:30.871599  8193 simple_allocator.cpp:71] Added slave
201204200437-0-162 with cpus=1; mem=1024
I0420 04:40:30.871680  8193 master.cpp:1143] Slave 201204200437-0-162
disconnected
I0420 04:40:30.871819  8193 simple_allocator.cpp:83] Removed slave
201204200437-0-162

tcp dump says this:

POST /master/mesos.internal.RegisterSlaveMessage HTTP/1.0
User-Agent: libprocess/[email protected]:51851
Connection: Keep-Alive
Transfer-Encoding: chunked

87

..
*ip-10-252-94-24.us-west-2.compute.internal.*ip-10-252-94-24.us-west-2.compute.internal..
.cpus...                .......?..
.mem...         .......@ .?
0



so it looks like its reporting both a valid hostname and a loopback
addr.  Which will the master use?

btw I have both machines in the same security group, and opened all
tcp inbound for the group to the group.


On Thu, Apr 19, 2012 at 9:42 PM, Matei Zaharia <[email protected]> wrote:
> What hostname and port does the slave report for itself (i.e. when the master 
> sees it connect, what message does it print)? It could be that the master 
> cannot connect back to that address. Maybe you need to open up communication 
> among machines in your EC2 security groups.
>
> Matei
>
> On Apr 19, 2012, at 9:10 PM, Scott Smith wrote:
>
>> Direct IP/port.  No zookeeper.
>> On Apr 19, 2012 7:35 PM, "John Sirois" <[email protected]> wrote:
>>
>>> How are your slaves connecting to the master?  Via zookeeper or via known
>>> hostname/ip ?
>>>
>>> On Thursday, April 19, 2012, Scott Smith wrote:
>>>
>>>> I'm trying to set up a cluster on ec2, but not using the canned
>>>> scripts/image.  I built the latest svn on Ubuntu 11.10 amd64, and copied
>>>> the build to a second node.  Both are c1.medium instances (not that it
>>>> should matter).  No other software is running (no hdfs, no hadoop, etc).
>>>>
>>>> The problem I have is the slave repeatedly (approx once per second)
>>>> connects, advertises its resources, gets added, and then disconnects.  No
>>>> reason is given for disconnecting.  There are no messages on the slave,
>>>> only 5 or 6 messages on the master.
>>>>
>>>> I'm not sure what the next diagnostic step should be; I was hoping
>>> someone
>>>> else ran into the same problem and could point out what I did wrong.  Any
>>>> advice?
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>> --
>>> John Sirois
>>> 303-512-3301
>>>
>



-- 
        Scott

Reply via email to