Ah yes I forgot, the other piece is network membership for the replicated log, through our zookeeper::Group related code. Is that what you're referring to?
We could put that behind a module interface as well. On Fri, Jun 12, 2020 at 9:10 PM Benjamin Mahler <bmah...@apache.org> wrote: > > Apache ZooKeeper is used for a number of different things in Mesos, with > > only leader election being customisable with modules. Your existing > modular > > functionality is insufficient for decoupling from Apache ZooKeeper. > > Can you clarify which other functionality you're referring to? Mesos only > relies on ZK for leader election and detection. We do have some libraries > available in the code for storing the registry in ZK but we do not support > that currently. > > On Thu, Jun 11, 2020 at 11:02 PM Samuel Marks <sam...@offscale.io> wrote: > >> Apache ZooKeeper is used for a number of different things in Mesos, with >> only leader election being customisable with modules. Your existing >> modular >> functionality is insufficient for decoupling from Apache ZooKeeper. >> >> We are ready and waiting to develop here. >> >> As mentioned over our off-mailing-list communiqué: >> >> The main advantages—and reasoning—for my investment into Mesos has been >> [the prospect of]: >> >> - Making it performant and low-resource utilising on a very small >> number >> of nodes… potentially even down to 1 node so that it can 'compete' with >> Docker Compose. >> - Reducing the number of distributed systems that all do the same thing >> in a datacentre environment. >> - Postgres has its own consensus, Docker—e.g, via Kubernetes or >> Compose—has its own consensus, ZooKeeper has its own consensus, >> other >> things like distributed filesystems… they too; have their own >> consensus. >> - The big sell from that first point is actually showing people how to >> run Mesos and use it for their regular day-to-day development, e.g.: >> 1. Context switching when the one engineer is on multiple projects >> 2. …then use the same technology at scale. >> - The big sell from that second point is to reduce the network traffic, >> speed up each systems consensus—through all using the one system—and >> simplify analytics. >> >> This would be a big deal for your bigger clients, who can easily >> quantify what this network traffic costs, and what a reduction in >> network >> traffic with a corresponding increase in speed would mean. >> >> Eventually this will mean that Ops people can tradeoff guarantees for >> speed (and vice-versa). >> - Supporting ZooKeeper, Consul, and etcd is just the start. >> - Supporting Mesos is just the start. >> - We plan on adding more consensus-guaranteeing systems—maybe even our >> own Paxos and Raft—and adding this to systems in the Mesos ecosystem >> like >> Chronos, Marathon, and Aurora. >> It is my understanding that a big part of Mesosphere's rebranding is >> Kubernetes related. >> >> Recently—well, just before COVID19!—I spoke at the Sydney Kubernetes >> Meetup >> at Google. They too—including Google—were excited by the prospect of >> removing etcd as a hard-dependency, and supporting all the different ones >> liboffkv supports. >> >> I have the budget, team, and expertise at the ready to invest and >> contribute these changes. If there are certain design patterns and >> refactors you want us to commit to along the way, just say the word. >> >> Excitedly yours, >> >> Samuel Marks >> Charity <https://sydneyscientific.org> | consultancy <https://offscale.io >> > >> | open-source <https://github.com/offscale> | LinkedIn >> <https://linkedin.com/in/samuelmarks> >> >> >> On Wed, Jun 10, 2020 at 1:42 AM Benjamin Mahler <bmah...@apache.org> >> wrote: >> >> > AndreiS just reminded me that we have module interfaces for the master >> > detector and contender: >> > >> > >> > >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/module/detector.hpp >> > >> > >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/module/contender.hpp >> > >> > >> > >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/master/detector.hpp >> > >> > >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/master/contender.hpp >> > >> > These should allow you to implement the integration with your library, >> we >> > may need to adjust the interfaces a little, but this will let you get >> what >> > you need done without the burden on us to shepherd the work. >> > >> > On Fri, May 22, 2020 at 8:38 PM Samuel Marks <sam...@offscale.io> >> wrote: >> > >> > > Following on from the discussion on GitHub and here on the >> mailing-list, >> > > here is the proposal from me and my team: >> > > ------------------------------ >> > > >> > > Choice of approach >> > > >> > > The “mediator” of every interaction with ZooKeeper in Mesos is the >> > > ZooKeeper >> > > class, declared in include/mesos/zookeeper/zookeeper.hpp. >> > > >> > > Of note are the following two differences in the *styles* of API >> provided >> > > by ZooKeeper class and liboffkv: >> > > >> > > - >> > > >> > > Push-style mechanism of notifications on changes in “watched” data, >> > > versus pull-style one in liboffkv. In Mesos, the notifications are >> > > delivered via the Watcher interface, defined in the same file as >> > > ZooKeeper. This interface has the process method, which is invoked >> by >> > an >> > > instance of ZooKeeper at most once for each watch. There is also a >> > > special event which informs the watcher that the connection has >> been >> > > dropped. An optional instance of Watcher is passed to the >> constructor >> > of >> > > ZooKeeper. >> > > - >> > > >> > > Asynchronous session establishment process in ZooKeeper versus >> > > synchronous one (if at all — e.g. for Consul there is no concept of >> > > “session” currently defined at all) in liboffkv. >> > > >> > > The two users of the ZooKeeper are: >> > > >> > > 1. >> > > >> > > GroupProcess; >> > > 2. >> > > >> > > ZooKeeperStorageProcess. >> > > >> > > We will thus evaluate the possible approaches of integrating liboffkv >> > into >> > > Mesos through the prism of details of their usage. >> > > >> > > The two possible approaches are: >> > > >> > > 1. >> > > >> > > Replace all usages of ZooKeeper with liboffkv-specific code under >> > #ifdef >> > > guards. >> > > >> > > This approach would scale badly, as alternative liboffkv-specific >> > > implementations will be needed for both of the users. >> > > >> > > Moreover, we think that conditional compilation results in >> maintenance >> > > nightmare; see, e.g.: >> > > - >> > > >> > > RealWaitForChar() in vim <https://geoff.greer.fm/vim/>; >> > > - >> > > >> > > “#ifdef Considered Harmful, or Portability Experience With C >> News” >> > > paper by Henry Spencer and Geoff Collyer >> > > < >> http://doc.cat-v.org/henry_spencer/ifdef_considered_harmful.pdf>. >> > > >> > > The creators of the C programming language, which introduced the >> > concept >> > > in the first place, have also spoken against conditional >> compilation: >> > > - >> > > >> > > In “The Practice of Programming” by Brian W. Kernighan and Rob >> > Pike, >> > > the following advice is given: “Avoid conditional compilation. >> > > Conditional >> > > compilation with #ifdef and similar preprocessor directives is >> hard >> > > to manage, because information tends to get sprinkled throughout >> > the >> > > source.” >> > > - >> > > >> > > In “Plan 9 from Bell Labs” paper by Rob Pike, Ken Thompson et >> al. >> > > <https://pdos.csail.mit.edu/archive/6.824-2012/papers/plan9.pdf >> >, >> > > the >> > > following is said: “Conditional compilation, even with #ifdef, >> is >> > > used sparingly in Plan 9. The only architecture-dependent >> #ifdefs >> > in >> > > the system are in low-level routines in the graphics library. >> > > Instead, we >> > > avoid such dependencies or, when necessary, isolate them in >> > > separate source >> > > files or libraries. Besides making code hard to read, #ifdefs >> make >> > it >> > > impossible to know what source is compiled into the binary or >> > whether >> > > source protected by them will compile or work properly. They >> > > make it harder >> > > to maintain software.” >> > > 2. >> > > >> > > Modify the *implementation* of the ZooKeeper class to use liboffkv, >> > > possibly renaming the class to something akin to KvClient to >> reflect >> > the >> > > fact that would no longer be ZooKeeper-specific (this also includes >> > the >> > > renaming of error codes and other similar nomenclature). The old >> > > version of >> > > the implementation would be put under an #ifdef guard, thus >> minimising >> > > the number — and maintenance impact — of #ifdefs. >> > > >> > > Naturally there are some advantages to taking the ifdef approach, >> namely >> > > that we can guarantee no difference in builds between before >> offscale's >> > > contribution and after, unless a compiler flag is provided. >> > > >> > > However to avoid polluting the code, we are recommending the second >> > > approach. >> > > Incompatibilities >> > > >> > > The following is the list of incompatibilities between the interfaces >> of >> > > ZooKeeper class and liboffkv. Some of those features should be >> > implemented >> > > in liboffkv; others should be emulated inside the ZooKeeper/KvClient >> > class; >> > > and for others still, the change of the interface of >> ZooKeeper/KvClient >> > is >> > > the preferred solution. >> > > >> > > - >> > > >> > > Asynchronous session establishment. We propose to emulate this >> through >> > > spawning a new thread in the constructor of ZooKeeper/KvClient. >> > > - >> > > >> > > Push-style watch notification API. We propose to emulate this >> through >> > > spawning a new thread for each watch; such a thread would then do >> the >> > > wait >> > > and then invoke watcher->process() under a mutex. The number of >> > threads >> > > should not be a concern here, as the only user that uses watches at >> > all >> > > ( >> > > GroupProcess) only registers at most one watch. >> > > - >> > > >> > > Multiple servers in URL string. We propose to implement this in >> > > liboffkv. >> > > - >> > > >> > > Authentication. We propose to implement this in liboffkv. >> > > - >> > > >> > > ACLs (access control lists). The following ACLs are in fact used >> for >> > > everything: >> > > >> > > _auth.isSome() >> > > ? zookeeper::EVERYONE_READ_CREATOR_ALL >> > > : ZOO_OPEN_ACL_UNSAFE >> > > >> > > We thus propose to: >> > > 1. >> > > >> > > implement rudimentary support for ACLs in liboffkv in the form >> of >> > an >> > > optional parameter to create(), >> > > >> > > bool protect_modify = false >> > > >> > > 2. >> > > >> > > change the interface of ZooKeeper/KvClient so that >> protect_modify >> > > flag is used instead of ACLs. >> > > - >> > > >> > > Configurable session timeout. We propose to implement this in >> > liboffkv. >> > > - >> > > >> > > Getting the actual session timeout, which might differ from the >> > > user-provided as a result of timeout negotiation with server. We >> > > propose to >> > > implement this in liboffkv. >> > > - >> > > >> > > Getting the session ID. We propose to implement this in liboffkv, >> with >> > > session ID being std::string; and to modify the interface >> accordingly. >> > > It is possible to hash a string into a 64-bit number, but in the >> > > circumstances given, we think it is just not worth it. >> > > - >> > > >> > > Getting the status of the connection to the server. We propose to >> > > implement this in liboffkv. >> > > - >> > > >> > > Sequenced nodes. We propose to emulate this in the class. Here is >> the >> > > pseudo-code of our solution: >> > > >> > > while (true) { >> > > [counter, version] = get("/counter") >> > > seqnum = counter + 1 >> > > name = "label" + seqnum >> > > try { >> > > commit { >> > > check "/counter" version, >> > > set "/counter" seqnum, >> > > create name value >> > > } >> > > break >> > > } catch (TxnAborted) {} >> > > } >> > > >> > > - >> > > >> > > “Recursive” creation of each parent in create(), akin to mkdir -p. >> > This >> > > is already emulated in the class, as ZooKeeper does not natively >> > support >> > > it; we propose to extend this emulation to work with liboffkv. >> > > - >> > > >> > > The semantics of the “set” operation if the entry does not exist: >> > > ZooKeeper fails with ZNONODE in this case, while liboffkv creates a >> > new >> > > node. We propose to emulate this in-class with a transaction. >> > > - >> > > >> > > The semantics of the “erase” operation: ZooKeeper fails with >> ZNOTEMPTY >> > > if node has children, while liboffkv removes the subtree >> recursively. >> > As >> > > neither of users ever attempts to remove node with children, we >> > propose >> > > to >> > > change the interface so that it declares (and actually implements) >> the >> > > liboffkv-compatible semantics. >> > > - >> > > >> > > Return of ZooKeeper-specific Stat structures instead of just >> versions. >> > > As both users only use the version field of this structure, we >> propose >> > > to >> > > simply alter the interface so that only the version is returned. >> > > - >> > > >> > > Explicit “session drop” operation that also immediately erases all >> the >> > > “leased” nodes. We propose to implement this in liboffkv. >> > > - >> > > >> > > Check if the node being created has leased parent. Currently, >> liboffkv >> > > declares this to be unspecified behavior: it may either throw (if >> > > ZooKeeper >> > > is used as the back-end) or successfully create the node >> (otherwise). >> > As >> > > neither of users ever attempts to create such a node, we propose to >> > > leave >> > > this as is. >> > > >> > > Estimates >> > > We estimate that—including tests—this will be ready by the end of next >> > > month. >> > > ------------------------------ >> > > >> > > Open to alternative suggestions, otherwise we'll begin. >> > > Samuel Marks >> > > Charity <https://sydneyscientific.org> | consultancy < >> > https://offscale.io> >> > > | open-source <https://github.com/offscale> | LinkedIn >> > > <https://linkedin.com/in/samuelmarks> >> > > >> > > >> > > On Sat, May 2, 2020 at 4:04 AM Benjamin Mahler <bmah...@apache.org> >> > wrote: >> > > >> > > > So it sounds like: >> > > > >> > > > Zookeeper: Official C library has an async API. Are we gaining a lot >> > with >> > > > the third party C++ wrapper you pointed to? Maybe it "just works", >> but >> > it >> > > > looks very inactive and it's hard to tell how maintained it is. >> > > > >> > > > Consul: No official C or C++ library. Only some third party C++ ones >> > that >> > > > look pretty inactive. The ppconsul one you linked to does have an >> issue >> > > > about an async API, I commented on it: >> > > > https://github.com/oliora/ppconsul/issues/26. >> > > > >> > > > etcd: Can use gRPC c++ client async API. >> > > > >> > > > Since 2 of 3 provide an async API already, I would lean more >> towards an >> > > > async API so that we don't have to change anything with the mesos >> code >> > > when >> > > > the last one gets an async implementation. However, we currently >> use >> > the >> > > > synchronous ZK API so I realize this would be more work to first >> adjust >> > > the >> > > > mesos code to use the async zookeeper API. I agree that a >> synchronous >> > > > interface is simpler to start with since that will be an easier >> > > integration >> > > > and we currently do not perform many concurrent operations (and >> > probably >> > > > won't anytime soon). >> > > > >> > > > On Sun, Apr 26, 2020 at 11:17 PM Samuel Marks <sam...@offscale.io> >> > > wrote: >> > > > >> > > > > In terms of asynchronous vs synchronous interfacing, when we >> started >> > > > > liboffkv, it had an asynchronous interface. Then we decided to >> drop >> > it >> > > > and >> > > > > implemented a synchronous one, due to the dependent libraries >> which >> > > > > liboffkv uses under the hood. >> > > > > >> > > > > Our ZooKeeper implementation uses the zookeeper-cpp library >> > > > > <https://github.com/tgockel/zookeeper-cpp>—a well-maintained C++ >> > > wrapper >> > > > > around common Zookeeper C bindings [which we contributed to vcpkg >> > > > > <https://github.com/microsoft/vcpkg/pull/7001>]. It has an >> > > asynchronous >> > > > > interface based on std::future >> > > > > <https://en.cppreference.com/w/cpp/thread/future>. Since >> std::future >> > > > does >> > > > > not provide chaining or any callbacks, a Zookeeper-specific result >> > > cannot >> > > > > be asynchronously mapped to liboffkv result. In early versions of >> > > > liboffkv >> > > > > we used thread pool to do the mapping. >> > > > > >> > > > > Consul implementation is based on the ppconsul >> > > > > <https://github.com/oliora/ppconsul> library [which we >> contributed >> > to >> > > > > vcpkg >> > > > > < >> > > > > >> > > > >> > > >> > >> https://github.com/microsoft/vcpkg/pulls?q=is%3Apr+author%3ASamuelMarks+ppconsul >> > > > > >], >> > > > > which in turn utilizes libcurl <https://curl.haxx.se/libcurl>. >> > > > > Unfortunately, ppconsul uses libcurl's easy interface, and >> > consequently >> > > > it >> > > > > is synchronous by design. Again, in the early version of the >> library >> > we >> > > > > used a thread pool to overcome this limitation. >> > > > > >> > > > > As for etcd, we autogenerated the gRPC C++ client >> > > > > <https://github.com/offscale/etcd-client-cpp> [which we >> contributed >> > to >> > > > > vcpkg >> > > > > <https://github.com/microsoft/vcpkg/pull/6999>]. gRPC provides an >> > > > > asynchronous interface, so a "fair" async client can be >> implemented >> > on >> > > > top >> > > > > of it. >> > > > > >> > > > > To sum up, the chosen toolkit provided two of three >> implementations >> > > > require >> > > > > thread pool. After careful consideration, we have preferred to >> give >> > the >> > > > > user control over threading and opted out of the asynchrony. >> > > > > >> > > > > Nevertheless, there are some options. zookeeper-cpp allows >> building >> > > with >> > > > > custom futures/promises, so we can create a custom build to use in >> > > > > liboffkv/Mesos. Another variant is to use plain C ZK bindings >> > > > > < >> > > > > >> > > > >> > > >> > >> https://gitbox.apache.org/repos/asf?p=zookeeper.git;a=tree;f=zookeeper-client/zookeeper-client-c;h=c72b57355c977366edfe11304067ff35f5cf215d;hb=HEAD >> > > > > > >> > > > > instead of the C++ library. >> > > > > As for the Consul client, the only meaningful option is to opt >> out of >> > > > using >> > > > > ppconsul and operate through libcurl's multi interface. >> > > > > >> > > > > At this point implementing asynchronous interfaces will require >> > > rewriting >> > > > > liboffkv from the ground up. I can allocate the budget for doing >> > this, >> > > > as I >> > > > > have done to date. However, it would be good to have some more >> > > > > back-and-forth before reengaging. >> > > > > >> > > > > Design Doc: >> > > > > >> > > > > >> > > > >> > > >> > >> https://docs.google.com/document/d/1NOfyt7NzpMxxatdFs3f9ixKUS81DHHDVEKBbtVfVi_0 >> > > > > [feel free to add it to >> > > > > http://mesos.apache.org/documentation/latest/design-docs/] >> > > > > >> > > > > Thanks, >> > > > > >> > > > > *SAMUEL MARKS* >> > > > > Sydney Medical School | Westmead Institute for Medical Research | >> > > > > https://linkedin.com/in/samuelmarks >> > > > > Director | Sydney Scientific Foundation Ltd < >> > > > https://sydneyscientific.org> >> > > > > | Offscale.io of Sydney Scientific Pty Ltd <https://offscale.io> >> > > > > >> > > > > PS: Damien - not against contributing to FoundationDB, but >> priorities >> > > are >> > > > > Mesos and the Mesos ecosystem, followed by Kuberentes and its >> > > ecosystem. >> > > > > >> > > > > On Tue, Apr 21, 2020 at 3:19 AM Benjamin Mahler < >> bmah...@apache.org> >> > > > > wrote: >> > > > > >> > > > > > Samuel: One more thing I forgot to mention, we would prefer to >> use >> > an >> > > > > > asynchronous client interface rather than a synchronous one. Is >> > that >> > > > > > something you have considered? >> > > > > > >> > > > > > On Fri, Apr 17, 2020 at 6:11 PM Vinod Kone < >> vinodk...@apache.org> >> > > > wrote: >> > > > > > >> > > > > > > Hi Samuel, >> > > > > > > >> > > > > > > Thanks for showing interest in contributing to the project. >> > Having >> > > > > > > optionality between ZooKeeper and Etcd would be great for the >> > > project >> > > > > and >> > > > > > > something that has been brought up a few times before, as you >> > > noted. >> > > > > > > >> > > > > > > I echo everything that BenM said. As part of the design it >> would >> > be >> > > > > great >> > > > > > > to see the migration path for users currently using Mesos with >> > > > > ZooKeeper >> > > > > > to >> > > > > > > Etcd. Ideally, the migration can happen without much user >> > > > intervention. >> > > > > > > >> > > > > > > Additionally, from our past experience, efforts like these are >> > more >> > > > > > > successful if the people writing the code have experience with >> > how >> > > > > things >> > > > > > > work in Mesos code base. So I would recommend starting small, >> > maybe >> > > > > have >> > > > > > a >> > > > > > > few engineers work on a couple "newbie" tickets and do some >> small >> > > > > > projects >> > > > > > > and have those committed to the project. That gives the >> > committers >> > > > some >> > > > > > > level of confidence about quality of the code and be more >> open to >> > > > > bigger >> > > > > > > changes like etcd integration. It would also help contributors >> > get >> > > a >> > > > > > better >> > > > > > > feeling for the lay of the land and see if they are truly >> > > interested >> > > > in >> > > > > > > maintaining this piece of integration for the long haul. This >> is >> > a >> > > > bit >> > > > > > of a >> > > > > > > longer path but I think it would be more a fruitful one. >> > > > > > > >> > > > > > > Looking forward to seeing new contributions to Mesos including >> > the >> > > > > above >> > > > > > > design! >> > > > > > > >> > > > > > > Thanks, >> > > > > > > >> > > > > > > On Fri, Apr 17, 2020 at 4:52 PM Samuel Marks < >> sam...@offscale.io >> > > >> > > > > wrote: >> > > > > > > >> > > > > > > > Happy to build a design doc, >> > > > > > > > >> > > > > > > > To answer your question on what Offscale.io is, it's my >> > software >> > > > and >> > > > > > > > biomedical engineering consultancy. Currently it's still >> rather >> > > > > small, >> > > > > > > with >> > > > > > > > only 8 engineers, but I'm expecting & preparing to grow >> > rapidly. >> > > > > > > > >> > > > > > > > My philosophy is always open-source and patent-free, so >> that's >> > > what >> > > > > my >> > > > > > > > consultancy—and for that matter, the charitable research >> that I >> > > > fund >> > > > > > > > through it <https://sydneyscientific.org>—follows. >> > > > > > > > >> > > > > > > > The goal of everything we create is: interoperable >> > > (cross-platform, >> > > > > > > > cross-technology, cross-language, multi-cloud); open-source >> > > > > (Apache-2.0 >> > > > > > > OR >> > > > > > > > MIT); with a view towards scaling: >> > > > > > > > >> > > > > > > > - teams; >> > > > > > > > - software-development <https://compilers.com.au>; >> > > > > > > > - infrastructure [this proposed Mesos contribution + our >> > > DevOps >> > > > > > > > tooling]; >> > > > > > > > - [in the charity's case] facilitating very large-scale >> > > medical >> > > > > > > > diagnostic screening. >> > > > > > > > >> > > > > > > > Technologies like Mesos we expect to both optimise resource >> > > > > > > > allocation—reducing costs and increasing data locality—and >> > award >> > > us >> > > > > > > > 'bragging rights' with which we can gain clients that are >> > already >> > > > > using >> > > > > > > > Mesos (which, from my experience, is always big corporates… >> > > though >> > > > > > > > hopefully contributions like these will make it attractive >> to >> > > small >> > > > > > > > companies also). >> > > > > > > > >> > > > > > > > So no, we're not going anywhere, and are planning to >> maintain >> > > this >> > > > > > > library >> > > > > > > > into the future >> > > > > > > > >> > > > > > > > PS: Once accepted by Mesos, we'll be making similar >> > contributions >> > > > to >> > > > > > > other >> > > > > > > > Mesos ecosystem projects like Chronos < >> > > > > https://mesos.github.io/chronos >> > > > > > >, >> > > > > > > > Marathon <https://github.com/mesosphere/marathon>, and >> Aurora >> > > > > > > > <https://github.com/aurora-scheduler/aurora> as well as to >> > > > unrelated >> > > > > > > > projects (e.g., removing etcd as a hard-dependency from >> > > Kubernetes >> > > > > > > > <https://kubernetes.io>… enabling them to choose between >> > > > ZooKeeper, >> > > > > > > etcd, >> > > > > > > > and Consul). >> > > > > > > > >> > > > > > > > Thanks for your continual feedback, >> > > > > > > > >> > > > > > > > *SAMUEL MARKS* >> > > > > > > > Sydney Medical School | Westmead Institute for Medical >> > Research | >> > > > > > > > https://linkedin.com/in/samuelmarks >> > > > > > > > Director | Sydney Scientific Foundation Ltd < >> > > > > > > https://sydneyscientific.org> >> > > > > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> > https://offscale.io> >> > > > > > > > >> > > > > > > > >> > > > > > > > On Sat, Apr 18, 2020 at 6:58 AM Benjamin Mahler < >> > > > bmah...@apache.org> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Oh ok, could you tell us a little more about how you're >> using >> > > > > Mesos? >> > > > > > > And >> > > > > > > > > what offscale.io is? >> > > > > > > > > >> > > > > > > > > Strictly speaking, we don't really need packaging and >> > releases >> > > as >> > > > > we >> > > > > > > can >> > > > > > > > > bundle the dependency in our repo and that's what we do >> for >> > > many >> > > > of >> > > > > > our >> > > > > > > > > dependencies. >> > > > > > > > > To me, the most important thing is the commitment to >> maintain >> > > the >> > > > > > > library >> > > > > > > > > and address issues that come up. >> > > > > > > > > I also would lean more towards a run-time flag rather >> than a >> > > > build >> > > > > > > level >> > > > > > > > > flag, if possible. >> > > > > > > > > >> > > > > > > > > I think the best place to start would be to put together a >> > > design >> > > > > > doc. >> > > > > > > > The >> > > > > > > > > act of writing that will force the author to think through >> > the >> > > > > > details >> > > > > > > > (and >> > > > > > > > > there are a lot of them!), and we'll then get a chance to >> > give >> > > > > > > feedback. >> > > > > > > > > You can look through the mailing list for past examples of >> > > design >> > > > > > docs >> > > > > > > > (in >> > > > > > > > > terms of which sections to include, etc). >> > > > > > > > > >> > > > > > > > > How does that sound? >> > > > > > > > > >> > > > > > > > > On Tue, Apr 14, 2020 at 8:44 PM Samuel Marks < >> > > sam...@offscale.io >> > > > > >> > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Dear Benjamin Mahler [and *Developers mailing-list for >> > Apache >> > > > > > > Mesos*], >> > > > > > > > > > >> > > > > > > > > > Thanks for responding so quickly. >> > > > > > > > > > >> > > > > > > > > > Actually this entire project I invested—time & money, >> > > > including a >> > > > > > > > > > development team—explicitly in order to contribute this >> to >> > > > Apache >> > > > > > > > Mesos. >> > > > > > > > > So >> > > > > > > > > > no releases yet, because I wanted to ensure it was up to >> > the >> > > > > > > > > specification >> > > > > > > > > > requirements referenced in dev@mesos.apache.org before >> > > > > proceeding >> > > > > > > with >> > > > > > > > > > packaging and releases. >> > > > > > > > > > >> > > > > > > > > > Tests have been setup in Travis CI for Linux (Ubuntu >> 18.04) >> > > and >> > > > > > > macOS, >> > > > > > > > > > happy to set them up elsewhere also. There are also some >> > > > Windows >> > > > > > > builds >> > > > > > > > > > that need a bit of tweaking, then they will be pushed >> into >> > CI >> > > > > also. >> > > > > > > We >> > > > > > > > > are >> > > > > > > > > > just starting to do some work on reducing build & test >> > times. >> > > > > > > > > > >> > > > > > > > > > Would be great to build a checklist of things you want >> to >> > see >> > > > > > before >> > > > > > > we >> > > > > > > > > > send the PR, e.g., >> > > > > > > > > > >> > > > > > > > > > - ☐ hosted docs; >> > > > > > > > > > - ☐ CI/CD—including packaging—for Windows, Linux, and >> > > macOS; >> > > > > > > > > > - ☐ releases on GitHub; >> > > > > > > > > > - ☐ consistent session and auth interface >> > > > > > > > > > - ☐ different tests [can you expand here?] >> > > > > > > > > > >> > > > > > > > > > This is just an example checklist, would be best if you >> and >> > > > > others >> > > > > > > can >> > > > > > > > > > flesh it out, so when we do send the PR it's in an >> > > immediately >> > > > > > > mergable >> > > > > > > > > > state. >> > > > > > > > > > >> > > > > > > > > > BTW: Originally had a debate with my team about whether >> to >> > > > send a >> > > > > > PR >> > > > > > > > out >> > > > > > > > > of >> > > > > > > > > > the blue—like Microsoft famously did for Node.js >> > > > > > > > > > <https://github.com/nodejs/node/pull/4765>—or start an >> > > *offer >> > > > > > > thread* >> > > > > > > > on >> > > > > > > > > > the developers mailing-list. >> > > > > > > > > > >> > > > > > > > > > Looking forward to contributing 🦀 >> > > > > > > > > > >> > > > > > > > > > *SAMUEL MARKS* >> > > > > > > > > > Sydney Medical School | Westmead Institute for Medical >> > > > Research | >> > > > > > > > > > https://linkedin.com/in/samuelmarks >> > > > > > > > > > Director | Sydney Scientific Foundation Ltd < >> > > > > > > > > https://sydneyscientific.org> >> > > > > > > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> > > > https://offscale.io> >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > On Wed, Apr 15, 2020 at 2:38 AM Benjamin Mahler < >> > > > > > bmah...@apache.org> >> > > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Thanks for reaching out, a well maintained and well >> > written >> > > > > > wrapper >> > > > > > > > > > > interface to the three backends would certainly make >> this >> > > > > easier >> > > > > > > for >> > > > > > > > us >> > > > > > > > > > vs >> > > > > > > > > > > implementing such an interface ourselves. >> > > > > > > > > > > >> > > > > > > > > > > Is this the client interface? >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/offscale/liboffkv/blob/d31181a1e74c5faa0b7f5d7001879640b4d9f111/liboffkv/client.hpp#L115-L142 >> > > > > > > > > > > >> > > > > > > > > > > At a quick glance, three ZK things that we rely on but >> > seem >> > > > to >> > > > > be >> > > > > > > > > absent >> > > > > > > > > > > from the common interface is the ZK session, >> > > authentication, >> > > > > and >> > > > > > > > > > > authorization. How will these be provided via the >> common >> > > > > > interface? >> > > > > > > > > > > >> > > > > > > > > > > Here is our ZK interface wrapper if you want to see >> what >> > > > kinds >> > > > > of >> > > > > > > > > things >> > > > > > > > > > we >> > > > > > > > > > > use: >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/zookeeper/zookeeper.hpp#L72-L339 >> > > > > > > > > > > >> > > > > > > > > > > The project has 0 releases and 0 issues, what kind of >> > usage >> > > > has >> > > > > > it >> > > > > > > > > seen? >> > > > > > > > > > > Has there been any testing yet? Would Offscale.io be >> > doing >> > > > some >> > > > > > of >> > > > > > > > the >> > > > > > > > > > > testing? >> > > > > > > > > > > >> > > > > > > > > > > On Mon, Apr 13, 2020 at 7:54 PM Samuel Marks < >> > > > > sam...@offscale.io >> > > > > > > >> > > > > > > > > wrote: >> > > > > > > > > > > >> > > > > > > > > > > > Apache ZooKeeper <https://zookeeper.apache.org> is >> a >> > > large >> > > > > > > > > dependency. >> > > > > > > > > > > > Enabling developers and operations to use etcd < >> > > > > > https://etcd.io >> > > > > > > >, >> > > > > > > > > > Consul >> > > > > > > > > > > > <https://consul.io>, or ZooKeeper should reduce >> > resource >> > > > > > > > utilisation >> > > > > > > > > > and >> > > > > > > > > > > > enable new use cases. >> > > > > > > > > > > > >> > > > > > > > > > > > There have already been a number of suggestions to >> get >> > > rid >> > > > of >> > > > > > > hard >> > > > > > > > > > > > dependency on ZooKeeper. For example, see: >> MESOS-1806 >> > > > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-1806>, >> > > > > MESOS-3574 >> > > > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-3574>, >> > > > > MESOS-3797 >> > > > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-3797>, >> > > > > MESOS-5828 >> > > > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-5828>, >> > > > > MESOS-5829 >> > > > > > > > > > > > <https://issues.apache.org/jira/browse/MESOS-5829>. >> > > > However, >> > > > > > > there >> > > > > > > > > are >> > > > > > > > > > > > difficulties in supporting a few implementations for >> > > > > different >> > > > > > > > > services >> > > > > > > > > > > > with quite distinct data models. >> > > > > > > > > > > > >> > > > > > > > > > > > A few months ago offscale.io invested in a >> solution to >> > > > this >> > > > > > > > problem >> > > > > > > > > - >> > > > > > > > > > > > liboffkv <https://github.com/offscale/liboffkv> – a >> > > *C++* >> > > > > > > library >> > > > > > > > > > which >> > > > > > > > > > > > provides a *uniform interface over ZooKeeper, >> Consul KV >> > > and >> > > > > > > etcd*. >> > > > > > > > It >> > > > > > > > > > > > abstracts common features of these services into its >> > own >> > > > data >> > > > > > > model >> > > > > > > > > > which >> > > > > > > > > > > > is very similar to ZooKeeper’s one. Careful >> attention >> > was >> > > > > paid >> > > > > > to >> > > > > > > > > keep >> > > > > > > > > > > > methods both efficient and consistent. It is >> > > > cross-platform, >> > > > > > > > > > > > open-source (*Apache-2.0 >> > > > > > > > > > > > OR MIT*), and is written in C++, with vcpkg >> packaging, >> > *C >> > > > > > library >> > > > > > > > > > output >> > > > > > > > > > > > < >> > > > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > >> > > >> https://github.com/offscale/liboffkv/blob/d3d549e/CMakeLists.txt#L29-L35 >> > > > > > > > > > > > >*, >> > > > > > > > > > > > and additional interfaces in *Go < >> > > > > > > > > https://github.com/offscale?q=goffkv >> > > > > > > > > > > >*, >> > > > > > > > > > > > *Java >> > > > > > > > > > > > <https://github.com/offscale/liboffkv-java>*, and >> > *Rust >> > > > > > > > > > > > <https://github.com/offscale/rsoffkv>*. >> > > > > > > > > > > > >> > > > > > > > > > > > Offscale.io proposes to replace all ZooKeeper >> usages in >> > > > Mesos >> > > > > > > with >> > > > > > > > > > usages >> > > > > > > > > > > > of liboffkv. Since all interactions which require >> > > ZooKeeper >> > > > > in >> > > > > > > > Mesos >> > > > > > > > > > are >> > > > > > > > > > > > conducted through the class Group (and GroupProcess) >> > > with a >> > > > > > clear >> > > > > > > > > > > interface >> > > > > > > > > > > > the obvious way to introduce changes is to provide >> > > another >> > > > > > > > > > implementation >> > > > > > > > > > > > of the class which uses liboffkv instead of >> ZooKeeper. >> > In >> > > > > this >> > > > > > > case >> > > > > > > > > the >> > > > > > > > > > > > original implementation may be left unchanged in the >> > > > codebase >> > > > > > and >> > > > > > > > > build >> > > > > > > > > > > > flags to select from ZK-only and liboffkv variants >> may >> > be >> > > > > > > > introduced. >> > > > > > > > > > > Once >> > > > > > > > > > > > the community is confident, you can decide to remove >> > the >> > > > > > ZK-only >> > > > > > > > > > option, >> > > > > > > > > > > > and instead only support liboffkv [which internally >> has >> > > > build >> > > > > > > flags >> > > > > > > > > for >> > > > > > > > > > > > each service]. >> > > > > > > > > > > > >> > > > > > > > > > > > Removing the hard dependency on ZooKeeper will >> simplify >> > > > local >> > > > > > > > > > deployment >> > > > > > > > > > > > for testing purposes as well as enable using Mesos >> in >> > > > > clusters >> > > > > > > > > without >> > > > > > > > > > > > ZooKeeper, e.g. where etcd or Consul is used for >> > > > > coordination. >> > > > > > We >> > > > > > > > > > expect >> > > > > > > > > > > > this to greatly reduce the amount of >> resource—network, >> > > CPU, >> > > > > > disk, >> > > > > > > > > > > > memory—usage in a datacenter environment. >> > > > > > > > > > > > >> > > > > > > > > > > > If the community accepts the initiative, we will >> > > integrate >> > > > > > > liboffkv >> > > > > > > > > > into >> > > > > > > > > > > > Mesos. We are also ready to develop the library and >> > > > consider >> > > > > > any >> > > > > > > > > > > suggested >> > > > > > > > > > > > improvements. >> > > > > > > > > > > > *SAMUEL MARKS* >> > > > > > > > > > > > Sydney Medical School | Westmead Institute for >> Medical >> > > > > > Research | >> > > > > > > > > > > > https://linkedin.com/in/samuelmarks >> > > > > > > > > > > > Director | Sydney Scientific Foundation Ltd < >> > > > > > > > > > > https://sydneyscientific.org> >> > > > > > > > > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> > > > > > https://offscale.io> >> > > > > > > > > > > > *SYDNEY SCIENTIFIC FOUNDATION and THE UNIVERSITY OF >> > > SYDNEY* >> > > > > > > > > > > > >> > > > > > > > > > > > PS: We will be offering similar contributions to >> > Chronos >> > > > > > > > > > > > <https://mesos.github.io/chronos>, Marathon >> > > > > > > > > > > > <https://github.com/mesosphere/marathon>, Aurora >> > > > > > > > > > > > <https://github.com/aurora-scheduler/aurora>, and >> > > related >> > > > > > > > projects. >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >