I will be happy to! I'm just finishing up the process with my employer to be able to start submitting patches (I have them all ready and waiting).
By the way, I have discovered a bug, I think (unless it's already been found): after master failover, new frameworks I launch don't get resource offers. On Wed, Apr 17, 2013 at 4:47 PM, Vinod Kone <[email protected]> wrote: > Great to hear you were able to debug this David. Sounds like we should > either fix our help message or make the code work with the format that the > 'help' claims. I would think the former is easiest. Would you mind sending > us a patch? > > > On Wed, Apr 17, 2013 at 12:53 PM, David Greenberg <[email protected] > >wrote: > > > I got things to work, sort of, using the zk:// url type. I am now using > the > > 0.12.X branch from the Github mirror. When I try to bring up the masters, > > often multiple machines decide to be the master. Similarly, when I try to > > bring up slaves, they rarely detect the masters (maybe 5-10% of the > time). > > > > I triaged the issue and determined that the correct zk url to use is > this: > > > > zk:// > > > myserver1.com:2181/mesos,myserver2.com:2181/mesos,myserver3.com:2181/mesos > > > > Note that you must specify the same hierarchy path for each server. If > you > > don't do this, things will work, but unreliably. > > > > > > On Tue, Apr 16, 2013 at 4:50 PM, Benjamin Mahler > > <[email protected]>wrote: > > > > > I believe it needs to be prefixed with "zk://" rather than zoo. > > > > > > The relevant code is in detector.cpp: > > > > > > * } else if (master.find("zk://") == 0) {* > > > Try<zookeeper::URL> url = zookeeper::URL::parse(master); > > > if (url.isError()) { > > > return Error(url.error()); > > > } > > > if (url.get().path == "/") { > > > return Error( > > > "Expecting a (chroot) path for ZooKeeper ('/' is not > > > supported)"); > > > } > > > return new ZooKeeperMasterDetector(url.get(), pid, contend, quiet); > > > } > > > > > > > > > On Tue, Apr 16, 2013 at 1:01 PM, David Greenberg < > [email protected] > > > >wrote: > > > > > > > Hi Vinod, > > > > That's correct. I tried starting the masters with --zk instead of > > --url. > > > I > > > > am running mesos from the git mirror at commit 3fa8389. Should I try > > > > updating to head, or is there a particular more stable version I > should > > > > use? > > > > > > > > [email protected]:~/mesos/bin$ ./mesos-master.sh --zk=zoo:// > > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos > > > > I0416 19:59:45.205003 48438 main.cpp:116] Build: 2013-04-08 19:16:35 > by > > > > dgrnbrg > > > > I0416 19:59:45.205140 48438 main.cpp:117] Starting Mesos master > > > > I0416 19:59:45.205313 48466 master.cpp:309] Master started on > > > > 172.21.97.196:5050 > > > > I0416 19:59:45.205397 48466 master.cpp:324] Master ID: > > > > 201304161959-3294696876-5050-48438 > > > > W0416 19:59:45.205567 48484 master.cpp:81] No whitelist given. > > > Advertising > > > > offers for all slaves > > > > F0416 19:59:45.205613 48438 main.cpp:129] CHECK_SOME(detector) > failed: > > > > Failed to create a master detector: Cannot parse '@0.0.0.0:0' > > > > *** Check failure stack trace: *** > > > > @ 0x7f230ef49f1d google::LogMessage::Fail() > > > > @ 0x7f230ef4e5cf google::LogMessage::SendToLog() > > > > @ 0x7f230ef4db07 google::LogMessage::Flush() > > > > @ 0x7f230ef4f25d google::LogMessageFatal::~LogMessageFatal() > > > > @ 0x41c079 main > > > > @ 0x7f230cf74abd (unknown) > > > > @ 0x418979 (unknown) > > > > Aborted > > > > > > > > > > > > > > > > On Tue, Apr 16, 2013 at 2:38 PM, Vinod Kone <[email protected]> > > wrote: > > > > > > > > > Hi David, > > > > > > > > > > I'm assuming the myserver[1-2-3].com above are your zk servers? > > > > > > > > > > Also, masters take "--zk" instead of "--url" for zookeeper address. > > > > "--url" > > > > > might have been our old flag, which is deprecated (which version of > > > mesos > > > > > are you running?). > > > > > > > > > > For slaves, "--master" should be the same set of zk servers that > you > > > > > started your masters with. > > > > > > > > > > So, "--master="zoo://myserver1.com:2181,myserver2.com:2181, > > > > > myserver3.com:2181/mesos" > > > > > > > > > > Let me know if that works. If not, please paste the master and > slave > > > > logs. > > > > > > > > > > > > > > > > > > > > On Tue, Apr 16, 2013 at 10:58 AM, David Greenberg < > > > > [email protected] > > > > > >wrote: > > > > > > > > > > > I am trying to use the automatic master failover feature of > > > zookeeper, > > > > > but > > > > > > I'm seeing several issues: > > > > > > > > > > > > When I launch multiple masters with ./mesos-master.sh > --url=zoo:// > > > > > > myserver1.com:2181,myserver2.com:2181,myserver3.com:2181/mesos , > > > all 3 > > > > > > servers elect themselves as master and I don't see anything in > the > > > logs > > > > > > about zookeeper. > > > > > > > > > > > > Similarly, when I launch slaves, they require a --master setting, > > > > which, > > > > > if > > > > > > I provide the zoo:// URL, causes them to fault (and I don't see > > why I > > > > > > should provide a hostname, given that a host could be down. > > > > > > > > > > > > I assume that I'm making some silly mistake in how I'm launching > > > these > > > > > > processes. > > > > > > > > > > > > Thanks, > > > > > > David > > > > > > > > > > > > > > > > > > > > >
