I got things to work, sort of, using the zk:// url type. I am now using the
0.12.X branch from the Github mirror. When I try to bring up the masters,
often multiple machines decide to be the master. Similarly, when I try to
bring up slaves, they rarely detect the masters (maybe 5-10% of the time).

I triaged the issue and determined that the correct zk url to use is this:

zk://
myserver1.com:2181/mesos,myserver2.com:2181/mesos,myserver3.com:2181/mesos

Note that you must specify the same hierarchy path for each server. If you
don't do this, things will work, but unreliably.


On Tue, Apr 16, 2013 at 4:50 PM, Benjamin Mahler
<[email protected]>wrote:

> I believe it needs to be prefixed with "zk://" rather than zoo.
>
> The relevant code is in detector.cpp:
>
> *  } else if (master.find("zk://") == 0) {*
>     Try<zookeeper::URL> url = zookeeper::URL::parse(master);
>     if (url.isError()) {
>       return Error(url.error());
>     }
>     if (url.get().path == "/") {
>       return Error(
>           "Expecting a (chroot) path for ZooKeeper ('/' is not
> supported)");
>     }
>     return new ZooKeeperMasterDetector(url.get(), pid, contend, quiet);
>   }
>
>
> On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg <[email protected]
> >wrote:
>
> > Hi Vinod,
> > That's correct. I tried starting the masters with --zk instead of --url.
> I
> > am running mesos from the git mirror at commit 3fa8389. Should I try
> > updating to head, or is there a particular more stable version I should
> > use?
> >
> > [email protected]:~/mesos/bin$ ./mesos-master.sh --zk=zoo://
> > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos
> > I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35 by
> > dgrnbrg
> > I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master
> > I0416 19:59:45.205313 48466 master.cpp:309] Master started on
> > 172.21.97.196:5050
> > I0416 19:59:45.205397 48466 master.cpp:324] Master ID:
> > 201304161959-3294696876-5050-48438
> > W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given.
> Advertising
> > offers for all slaves
> > F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector) failed:
> > Failed to create a master detector: Cannot parse '@0.0.0.0:0'
> > *** Check failure stack trace: ***
> >     @     0x7f230ef49f1d  google::LogMessage::Fail()
> >     @     0x7f230ef4e5cf  google::LogMessage::SendToLog()
> >     @     0x7f230ef4db07  google::LogMessage::Flush()
> >     @     0x7f230ef4f25d  google::LogMessageFatal::~LogMessageFatal()
> >     @           0x41c079  main
> >     @     0x7f230cf74abd  (unknown)
> >     @           0x418979  (unknown)
> > Aborted
> >
> >
> >
> > On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <[email protected]> wrote:
> >
> > > Hi David,
> > >
> > > I'm assuming the myserver[1-2-3].com above are your zk servers?
> > >
> > > Also, masters take "--zk" instead of "--url" for zookeeper address.
> > "--url"
> > > might have been our old flag, which is deprecated (which version of
> mesos
> > > are you running?).
> > >
> > > For slaves, "--master" should be the same set of zk servers that you
> > > started your masters with.
> > >
> > > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181,
> > > myserver3.com:2181/mesos"
> > >
> > > Let me know if that works. If not, please paste the master and slave
> > logs.
> > >
> > >
> > >
> > > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg <
> > [email protected]
> > > >wrote:
> > >
> > > > I am trying to use the automatic master failover feature of
> zookeeper,
> > > but
> > > > I'm seeing several issues:
> > > >
> > > > When I launch multiple masters with ./mesos-master.sh --url=zoo://
> > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos ,
> all 3
> > > > servers elect themselves as master and I don't see anything in the
> logs
> > > > about zookeeper.
> > > >
> > > > Similarly, when I launch slaves, they require a --master setting,
> > which,
> > > if
> > > > I provide the zoo:// URL, causes them to fault (and I don't see why I
> > > > should provide a hostname, given that a host could be down.
> > > >
> > > > I assume that I'm making some silly mistake in how I'm launching
> these
> > > > processes.
> > > >
> > > > Thanks,
> > > > David
> > > >
> > >
> >
>

Reply via email to