Yes, I did start the slave with the --ip parameter set properly. Let me try having the master use the public IP address of the slave (rather than the AWS internal one) and see what happens. So if I have only one slave and that slave gets removed, would the master log remain empty after the removal even though the master was still ready to handle requests? If so, then I think we know what happened.
Jim -----Original Message----- From: Vinod Kone [mailto:[email protected]] Sent: Friday, November 02, 2012 12:54 PM To: [email protected] Subject: Re: WebUI problems >From the logs, the slave never got the 'registered' message from the master. The master removes/disconnects a slave, when the slave doesn't respond to its health checks, after a timeout. Did you try to start the slave with --ip=<public ip> as suggested earlier? I'm not familiar with AWS networking semantics, but I suspect you cannot connect from 107.22.185.93 --> 10.96.130.119? @vinodkone On Fri, Nov 2, 2012 at 12:36 PM, Jim Donahue <[email protected]> wrote: > Ben, > > Complete logs are attached. Note that the master log ends long before the > slave -- seems like the master has decided to go autistic. > > The master is using an AWS elastic IP address, which the slave uses to > connect. The master has a "slaves" file in its deploy directory with an > entry giving the AWS internal IP address of the slave (and the address in > the file matches the internal IP address in the AWS management console). > And it looks like they did rendezvous for a moment -- when I (briefly) got > the webUI up everything looked right. > > Thanks, > > Jim > > -----Original Message----- > From: Benjamin Mahler [mailto:[email protected]] > Sent: Friday, November 02, 2012 11:59 AM > To: [email protected] > Subject: Re: WebUI problems > > "But I can't connect to the webUI on the slave." -- right, slaves do not > have their own webuis anymore, the master collects slave information and > displays it in it's webui. > > Do you run in an environment where you have public and private IPs? It > looks like the slave cannot receive messages from the master. It looks like > you may want to try --ip=<public_slave_ip> when you start your slave. > > Can you provide the full master / slave logs for this? > Can you also provide the commands you're using to start the master / slave? > > On Fri, Nov 2, 2012 at 11:35 AM, Jim Donahue <[email protected]> wrote: > > > Now I'm seeing the master and slave go autistic. > > > > Using port 5050, I was able to get the webUI up exactly once and then > > everything looks like it dies. The log on the master shows a bunch of > > "slave already registered, resending ack" messages, followed by the slave > > disconnecting and reconnecting on the same port. Finally, the INFO log > > ends with an "adding slave" message and then just stops. > > > > As far as I can tell, the master is still running. But I can't connect to > > it again through the webUI. > > > > Looking at the slave log, the slave detected the master and then shows > > periodic reporting of its current disk usage and "allowed age" -- there's > > no indication of any disconnect in the slave log. But I can't connect to > > the webUI on the slave. > > > > > > Thanks, > > > > > > Jim > > > > -----Original Message----- > > From: Benjamin Mahler [mailto:[email protected]] > > Sent: Friday, November 02, 2012 10:29 AM > > To: [email protected] > > Subject: Re: WebUI problems > > > > We've recently killed the old webui: https://reviews.apache.org/r/7708/ > > > > In the process, the --webui_port flag was removed as it was no longer > > applicable. I was under the assumption our flag system would not allow > > extraneous flags to be provided, but perhaps that not the case. > > > > The new webui runs on 5050 as Erich indicated. Please report any issues > you > > find! > > > > On Fri, Nov 2, 2012 at 10:21 AM, Erich Nachbar <[email protected]> > > wrote: > > > > > Had the same problem. Try using port 5050 instead of the old 8080. The > > > webui_port option was ignored when I tried it. > > > > > > > > > On Fri, Nov 2, 2012 at 10:17 AM, Jim Donahue <[email protected]> > wrote: > > > > > > > Yesterday I built a new AMI using the latest Mesos and now I can't > > > connect > > > > to the web UI (which used to work). Logging into the instances (a > > master > > > > and a slave), all looks well -- the master sees the slave and the > slave > > > > sees the master. Both master and slave were started with the option > > > > > > > > --webui_port=5051 > > > > > > > > But no luck connecting to them with a browser. Has something changed > > > > recently that I missed? I noticed that I did have to change the > build > > > > recipe for my AMI to install some new libraries, but I didn't see any > > > > errors in the build and the tests all ran, except for the cgroup > ones. > > > > > > > > The other thing I noticed is that the logs on both master and slave > > have > > > > names of the form: > > > > > > > > ...invalid-user.log.INFO.... > > > > > > > > Is this something I should worry about? > > > > > > > > Thanks, > > > > > > > > Jim Donahue > > > > Adobe Systems > > > > > > > > > > > > > > > > -- > > > Erich Nachbar > > > CTO | Quantifind <http://quantifind.com/>| 650-430-5500 > > > > > >
