Anything we can do to help expedite this? - We really want to contribute to Mesos and its ecosystem. - It would be great to have them all decoupled from any particular consensus and key/value store. - Some significant new use-cases, IMHO, that will be facilitated by this.
Samuel Marks Charity <https://sydneyscientific.org> | consultancy <https://offscale.io> | open-source <https://github.com/offscale> | LinkedIn <https://linkedin.com/in/samuelmarks> On Fri, Jun 19, 2020 at 12:01 AM Samuel Marks <sam...@offscale.io> wrote: > Hey was just a little confused as to if I'm waiting for your next response > or if you wanted me to respond… > > Besides leader election and network membership, ZooKeeper is also utilized > in some JNI code through ZooKeeperStorage. But I'm not sure if those JNI > libraries are actually used. > > So if we could put all ZooKeeper-dependent functionality behind a module > interface and implement a few liboffkv-based modules, would that suffice? > > What is the sort of timeframe for your end? - And are we waiting on you, > or do you want us to prepare the contributions, send it through, then await > your review? > > PS: Happy to schedule a videoconference between our teams > > Samuel Marks > Charity <https://sydneyscientific.org> | consultancy <https://offscale.io> > | open-source <https://github.com/offscale> | LinkedIn > <https://linkedin.com/in/samuelmarks> > > > On Sat, Jun 13, 2020 at 11:12 AM Benjamin Mahler <bmah...@apache.org> > wrote: > >> Ah yes I forgot, the other piece is network membership for the replicated >> log, through our zookeeper::Group related code. Is that what you're >> referring to? >> >> We could put that behind a module interface as well. >> >> On Fri, Jun 12, 2020 at 9:10 PM Benjamin Mahler <bmah...@apache.org> >> wrote: >> >> > > Apache ZooKeeper is used for a number of different things in Mesos, >> with >> > > only leader election being customisable with modules. Your existing >> > modular >> > > functionality is insufficient for decoupling from Apache ZooKeeper. >> > >> > Can you clarify which other functionality you're referring to? Mesos >> only >> > relies on ZK for leader election and detection. We do have some >> libraries >> > available in the code for storing the registry in ZK but we do not >> support >> > that currently. >> > >> > On Thu, Jun 11, 2020 at 11:02 PM Samuel Marks <sam...@offscale.io> >> wrote: >> > >> >> Apache ZooKeeper is used for a number of different things in Mesos, >> with >> >> only leader election being customisable with modules. Your existing >> >> modular >> >> functionality is insufficient for decoupling from Apache ZooKeeper. >> >> >> >> We are ready and waiting to develop here. >> >> >> >> As mentioned over our off-mailing-list communiqué: >> >> >> >> The main advantages—and reasoning—for my investment into Mesos has been >> >> [the prospect of]: >> >> >> >> - Making it performant and low-resource utilising on a very small >> >> number >> >> of nodes… potentially even down to 1 node so that it can 'compete' >> with >> >> Docker Compose. >> >> - Reducing the number of distributed systems that all do the same >> thing >> >> in a datacentre environment. >> >> - Postgres has its own consensus, Docker—e.g, via Kubernetes or >> >> Compose—has its own consensus, ZooKeeper has its own consensus, >> >> other >> >> things like distributed filesystems… they too; have their own >> >> consensus. >> >> - The big sell from that first point is actually showing people how >> to >> >> run Mesos and use it for their regular day-to-day development, e.g.: >> >> 1. Context switching when the one engineer is on multiple projects >> >> 2. …then use the same technology at scale. >> >> - The big sell from that second point is to reduce the network >> traffic, >> >> speed up each systems consensus—through all using the one system—and >> >> simplify analytics. >> >> >> >> This would be a big deal for your bigger clients, who can easily >> >> quantify what this network traffic costs, and what a reduction in >> >> network >> >> traffic with a corresponding increase in speed would mean. >> >> >> >> Eventually this will mean that Ops people can tradeoff guarantees >> for >> >> speed (and vice-versa). >> >> - Supporting ZooKeeper, Consul, and etcd is just the start. >> >> - Supporting Mesos is just the start. >> >> - We plan on adding more consensus-guaranteeing systems—maybe even >> our >> >> own Paxos and Raft—and adding this to systems in the Mesos ecosystem >> >> like >> >> Chronos, Marathon, and Aurora. >> >> It is my understanding that a big part of Mesosphere's rebranding is >> >> Kubernetes related. >> >> >> >> Recently—well, just before COVID19!—I spoke at the Sydney Kubernetes >> >> Meetup >> >> at Google. They too—including Google—were excited by the prospect of >> >> removing etcd as a hard-dependency, and supporting all the different >> ones >> >> liboffkv supports. >> >> >> >> I have the budget, team, and expertise at the ready to invest and >> >> contribute these changes. If there are certain design patterns and >> >> refactors you want us to commit to along the way, just say the word. >> >> >> >> Excitedly yours, >> >> >> >> Samuel Marks >> >> Charity <https://sydneyscientific.org> | consultancy < >> https://offscale.io >> >> > >> >> | open-source <https://github.com/offscale> | LinkedIn >> >> <https://linkedin.com/in/samuelmarks> >> >> >> >> >> >> On Wed, Jun 10, 2020 at 1:42 AM Benjamin Mahler <bmah...@apache.org> >> >> wrote: >> >> >> >> > AndreiS just reminded me that we have module interfaces for the >> master >> >> > detector and contender: >> >> > >> >> > >> >> > >> >> >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/module/detector.hpp >> >> > >> >> > >> >> >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/module/contender.hpp >> >> > >> >> > >> >> > >> >> >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/master/detector.hpp >> >> > >> >> > >> >> >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/master/contender.hpp >> >> > >> >> > These should allow you to implement the integration with your >> library, >> >> we >> >> > may need to adjust the interfaces a little, but this will let you get >> >> what >> >> > you need done without the burden on us to shepherd the work. >> >> > >> >> > On Fri, May 22, 2020 at 8:38 PM Samuel Marks <sam...@offscale.io> >> >> wrote: >> >> > >> >> > > Following on from the discussion on GitHub and here on the >> >> mailing-list, >> >> > > here is the proposal from me and my team: >> >> > > ------------------------------ >> >> > > >> >> > > Choice of approach >> >> > > >> >> > > The “mediator” of every interaction with ZooKeeper in Mesos is the >> >> > > ZooKeeper >> >> > > class, declared in include/mesos/zookeeper/zookeeper.hpp. >> >> > > >> >> > > Of note are the following two differences in the *styles* of API >> >> provided >> >> > > by ZooKeeper class and liboffkv: >> >> > > >> >> > > - >> >> > > >> >> > > Push-style mechanism of notifications on changes in “watched” >> data, >> >> > > versus pull-style one in liboffkv. In Mesos, the notifications >> are >> >> > > delivered via the Watcher interface, defined in the same file as >> >> > > ZooKeeper. This interface has the process method, which is >> invoked >> >> by >> >> > an >> >> > > instance of ZooKeeper at most once for each watch. There is >> also a >> >> > > special event which informs the watcher that the connection has >> >> been >> >> > > dropped. An optional instance of Watcher is passed to the >> >> constructor >> >> > of >> >> > > ZooKeeper. >> >> > > - >> >> > > >> >> > > Asynchronous session establishment process in ZooKeeper versus >> >> > > synchronous one (if at all — e.g. for Consul there is no >> concept of >> >> > > “session” currently defined at all) in liboffkv. >> >> > > >> >> > > The two users of the ZooKeeper are: >> >> > > >> >> > > 1. >> >> > > >> >> > > GroupProcess; >> >> > > 2. >> >> > > >> >> > > ZooKeeperStorageProcess. >> >> > > >> >> > > We will thus evaluate the possible approaches of integrating >> liboffkv >> >> > into >> >> > > Mesos through the prism of details of their usage. >> >> > > >> >> > > The two possible approaches are: >> >> > > >> >> > > 1. >> >> > > >> >> > > Replace all usages of ZooKeeper with liboffkv-specific code >> under >> >> > #ifdef >> >> > > guards. >> >> > > >> >> > > This approach would scale badly, as alternative >> liboffkv-specific >> >> > > implementations will be needed for both of the users. >> >> > > >> >> > > Moreover, we think that conditional compilation results in >> >> maintenance >> >> > > nightmare; see, e.g.: >> >> > > - >> >> > > >> >> > > RealWaitForChar() in vim <https://geoff.greer.fm/vim/>; >> >> > > - >> >> > > >> >> > > “#ifdef Considered Harmful, or Portability Experience With C >> >> News” >> >> > > paper by Henry Spencer and Geoff Collyer >> >> > > < >> >> http://doc.cat-v.org/henry_spencer/ifdef_considered_harmful.pdf>. >> >> > > >> >> > > The creators of the C programming language, which introduced the >> >> > concept >> >> > > in the first place, have also spoken against conditional >> >> compilation: >> >> > > - >> >> > > >> >> > > In “The Practice of Programming” by Brian W. Kernighan and >> Rob >> >> > Pike, >> >> > > the following advice is given: “Avoid conditional >> compilation. >> >> > > Conditional >> >> > > compilation with #ifdef and similar preprocessor directives >> is >> >> hard >> >> > > to manage, because information tends to get sprinkled >> throughout >> >> > the >> >> > > source.” >> >> > > - >> >> > > >> >> > > In “Plan 9 from Bell Labs” paper by Rob Pike, Ken Thompson et >> >> al. >> >> > > < >> https://pdos.csail.mit.edu/archive/6.824-2012/papers/plan9.pdf >> >> >, >> >> > > the >> >> > > following is said: “Conditional compilation, even with >> #ifdef, >> >> is >> >> > > used sparingly in Plan 9. The only architecture-dependent >> >> #ifdefs >> >> > in >> >> > > the system are in low-level routines in the graphics library. >> >> > > Instead, we >> >> > > avoid such dependencies or, when necessary, isolate them in >> >> > > separate source >> >> > > files or libraries. Besides making code hard to read, #ifdefs >> >> make >> >> > it >> >> > > impossible to know what source is compiled into the binary or >> >> > whether >> >> > > source protected by them will compile or work properly. They >> >> > > make it harder >> >> > > to maintain software.” >> >> > > 2. >> >> > > >> >> > > Modify the *implementation* of the ZooKeeper class to use >> liboffkv, >> >> > > possibly renaming the class to something akin to KvClient to >> >> reflect >> >> > the >> >> > > fact that would no longer be ZooKeeper-specific (this also >> includes >> >> > the >> >> > > renaming of error codes and other similar nomenclature). The old >> >> > > version of >> >> > > the implementation would be put under an #ifdef guard, thus >> >> minimising >> >> > > the number — and maintenance impact — of #ifdefs. >> >> > > >> >> > > Naturally there are some advantages to taking the ifdef approach, >> >> namely >> >> > > that we can guarantee no difference in builds between before >> >> offscale's >> >> > > contribution and after, unless a compiler flag is provided. >> >> > > >> >> > > However to avoid polluting the code, we are recommending the second >> >> > > approach. >> >> > > Incompatibilities >> >> > > >> >> > > The following is the list of incompatibilities between the >> interfaces >> >> of >> >> > > ZooKeeper class and liboffkv. Some of those features should be >> >> > implemented >> >> > > in liboffkv; others should be emulated inside the >> ZooKeeper/KvClient >> >> > class; >> >> > > and for others still, the change of the interface of >> >> ZooKeeper/KvClient >> >> > is >> >> > > the preferred solution. >> >> > > >> >> > > - >> >> > > >> >> > > Asynchronous session establishment. We propose to emulate this >> >> through >> >> > > spawning a new thread in the constructor of ZooKeeper/KvClient. >> >> > > - >> >> > > >> >> > > Push-style watch notification API. We propose to emulate this >> >> through >> >> > > spawning a new thread for each watch; such a thread would then >> do >> >> the >> >> > > wait >> >> > > and then invoke watcher->process() under a mutex. The number of >> >> > threads >> >> > > should not be a concern here, as the only user that uses >> watches at >> >> > all >> >> > > ( >> >> > > GroupProcess) only registers at most one watch. >> >> > > - >> >> > > >> >> > > Multiple servers in URL string. We propose to implement this in >> >> > > liboffkv. >> >> > > - >> >> > > >> >> > > Authentication. We propose to implement this in liboffkv. >> >> > > - >> >> > > >> >> > > ACLs (access control lists). The following ACLs are in fact used >> >> for >> >> > > everything: >> >> > > >> >> > > _auth.isSome() >> >> > > ? zookeeper::EVERYONE_READ_CREATOR_ALL >> >> > > : ZOO_OPEN_ACL_UNSAFE >> >> > > >> >> > > We thus propose to: >> >> > > 1. >> >> > > >> >> > > implement rudimentary support for ACLs in liboffkv in the >> form >> >> of >> >> > an >> >> > > optional parameter to create(), >> >> > > >> >> > > bool protect_modify = false >> >> > > >> >> > > 2. >> >> > > >> >> > > change the interface of ZooKeeper/KvClient so that >> >> protect_modify >> >> > > flag is used instead of ACLs. >> >> > > - >> >> > > >> >> > > Configurable session timeout. We propose to implement this in >> >> > liboffkv. >> >> > > - >> >> > > >> >> > > Getting the actual session timeout, which might differ from the >> >> > > user-provided as a result of timeout negotiation with server. We >> >> > > propose to >> >> > > implement this in liboffkv. >> >> > > - >> >> > > >> >> > > Getting the session ID. We propose to implement this in >> liboffkv, >> >> with >> >> > > session ID being std::string; and to modify the interface >> >> accordingly. >> >> > > It is possible to hash a string into a 64-bit number, but in the >> >> > > circumstances given, we think it is just not worth it. >> >> > > - >> >> > > >> >> > > Getting the status of the connection to the server. We propose >> to >> >> > > implement this in liboffkv. >> >> > > - >> >> > > >> >> > > Sequenced nodes. We propose to emulate this in the class. Here >> is >> >> the >> >> > > pseudo-code of our solution: >> >> > > >> >> > > while (true) { >> >> > > [counter, version] = get("/counter") >> >> > > seqnum = counter + 1 >> >> > > name = "label" + seqnum >> >> > > try { >> >> > > commit { >> >> > > check "/counter" version, >> >> > > set "/counter" seqnum, >> >> > > create name value >> >> > > } >> >> > > break >> >> > > } catch (TxnAborted) {} >> >> > > } >> >> > > >> >> > > - >> >> > > >> >> > > “Recursive” creation of each parent in create(), akin to mkdir >> -p. >> >> > This >> >> > > is already emulated in the class, as ZooKeeper does not natively >> >> > support >> >> > > it; we propose to extend this emulation to work with liboffkv. >> >> > > - >> >> > > >> >> > > The semantics of the “set” operation if the entry does not >> exist: >> >> > > ZooKeeper fails with ZNONODE in this case, while liboffkv >> creates a >> >> > new >> >> > > node. We propose to emulate this in-class with a transaction. >> >> > > - >> >> > > >> >> > > The semantics of the “erase” operation: ZooKeeper fails with >> >> ZNOTEMPTY >> >> > > if node has children, while liboffkv removes the subtree >> >> recursively. >> >> > As >> >> > > neither of users ever attempts to remove node with children, we >> >> > propose >> >> > > to >> >> > > change the interface so that it declares (and actually >> implements) >> >> the >> >> > > liboffkv-compatible semantics. >> >> > > - >> >> > > >> >> > > Return of ZooKeeper-specific Stat structures instead of just >> >> versions. >> >> > > As both users only use the version field of this structure, we >> >> propose >> >> > > to >> >> > > simply alter the interface so that only the version is returned. >> >> > > - >> >> > > >> >> > > Explicit “session drop” operation that also immediately erases >> all >> >> the >> >> > > “leased” nodes. We propose to implement this in liboffkv. >> >> > > - >> >> > > >> >> > > Check if the node being created has leased parent. Currently, >> >> liboffkv >> >> > > declares this to be unspecified behavior: it may either throw >> (if >> >> > > ZooKeeper >> >> > > is used as the back-end) or successfully create the node >> >> (otherwise). >> >> > As >> >> > > neither of users ever attempts to create such a node, we >> propose to >> >> > > leave >> >> > > this as is. >> >> > > >> >> > > Estimates >> >> > > We estimate that—including tests—this will be ready by the end of >> next >> >> > > month. >> >> > > ------------------------------ >> >> > > >> >> > > Open to alternative suggestions, otherwise we'll begin. >> >> > > Samuel Marks >> >> > > Charity <https://sydneyscientific.org> | consultancy < >> >> > https://offscale.io> >> >> > > | open-source <https://github.com/offscale> | LinkedIn >> >> > > <https://linkedin.com/in/samuelmarks> >> >> > > >> >> > > >> >> > > On Sat, May 2, 2020 at 4:04 AM Benjamin Mahler <bmah...@apache.org >> > >> >> > wrote: >> >> > > >> >> > > > So it sounds like: >> >> > > > >> >> > > > Zookeeper: Official C library has an async API. Are we gaining a >> lot >> >> > with >> >> > > > the third party C++ wrapper you pointed to? Maybe it "just >> works", >> >> but >> >> > it >> >> > > > looks very inactive and it's hard to tell how maintained it is. >> >> > > > >> >> > > > Consul: No official C or C++ library. Only some third party C++ >> ones >> >> > that >> >> > > > look pretty inactive. The ppconsul one you linked to does have an >> >> issue >> >> > > > about an async API, I commented on it: >> >> > > > https://github.com/oliora/ppconsul/issues/26. >> >> > > > >> >> > > > etcd: Can use gRPC c++ client async API. >> >> > > > >> >> > > > Since 2 of 3 provide an async API already, I would lean more >> >> towards an >> >> > > > async API so that we don't have to change anything with the mesos >> >> code >> >> > > when >> >> > > > the last one gets an async implementation. However, we currently >> >> use >> >> > the >> >> > > > synchronous ZK API so I realize this would be more work to first >> >> adjust >> >> > > the >> >> > > > mesos code to use the async zookeeper API. I agree that a >> >> synchronous >> >> > > > interface is simpler to start with since that will be an easier >> >> > > integration >> >> > > > and we currently do not perform many concurrent operations (and >> >> > probably >> >> > > > won't anytime soon). >> >> > > > >> >> > > > On Sun, Apr 26, 2020 at 11:17 PM Samuel Marks < >> sam...@offscale.io> >> >> > > wrote: >> >> > > > >> >> > > > > In terms of asynchronous vs synchronous interfacing, when we >> >> started >> >> > > > > liboffkv, it had an asynchronous interface. Then we decided to >> >> drop >> >> > it >> >> > > > and >> >> > > > > implemented a synchronous one, due to the dependent libraries >> >> which >> >> > > > > liboffkv uses under the hood. >> >> > > > > >> >> > > > > Our ZooKeeper implementation uses the zookeeper-cpp library >> >> > > > > <https://github.com/tgockel/zookeeper-cpp>—a well-maintained >> C++ >> >> > > wrapper >> >> > > > > around common Zookeeper C bindings [which we contributed to >> vcpkg >> >> > > > > <https://github.com/microsoft/vcpkg/pull/7001>]. It has an >> >> > > asynchronous >> >> > > > > interface based on std::future >> >> > > > > <https://en.cppreference.com/w/cpp/thread/future>. Since >> >> std::future >> >> > > > does >> >> > > > > not provide chaining or any callbacks, a Zookeeper-specific >> result >> >> > > cannot >> >> > > > > be asynchronously mapped to liboffkv result. In early versions >> of >> >> > > > liboffkv >> >> > > > > we used thread pool to do the mapping. >> >> > > > > >> >> > > > > Consul implementation is based on the ppconsul >> >> > > > > <https://github.com/oliora/ppconsul> library [which we >> >> contributed >> >> > to >> >> > > > > vcpkg >> >> > > > > < >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://github.com/microsoft/vcpkg/pulls?q=is%3Apr+author%3ASamuelMarks+ppconsul >> >> > > > > >], >> >> > > > > which in turn utilizes libcurl <https://curl.haxx.se/libcurl>. >> >> > > > > Unfortunately, ppconsul uses libcurl's easy interface, and >> >> > consequently >> >> > > > it >> >> > > > > is synchronous by design. Again, in the early version of the >> >> library >> >> > we >> >> > > > > used a thread pool to overcome this limitation. >> >> > > > > >> >> > > > > As for etcd, we autogenerated the gRPC C++ client >> >> > > > > <https://github.com/offscale/etcd-client-cpp> [which we >> >> contributed >> >> > to >> >> > > > > vcpkg >> >> > > > > <https://github.com/microsoft/vcpkg/pull/6999>]. gRPC >> provides an >> >> > > > > asynchronous interface, so a "fair" async client can be >> >> implemented >> >> > on >> >> > > > top >> >> > > > > of it. >> >> > > > > >> >> > > > > To sum up, the chosen toolkit provided two of three >> >> implementations >> >> > > > require >> >> > > > > thread pool. After careful consideration, we have preferred to >> >> give >> >> > the >> >> > > > > user control over threading and opted out of the asynchrony. >> >> > > > > >> >> > > > > Nevertheless, there are some options. zookeeper-cpp allows >> >> building >> >> > > with >> >> > > > > custom futures/promises, so we can create a custom build to >> use in >> >> > > > > liboffkv/Mesos. Another variant is to use plain C ZK bindings >> >> > > > > < >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://gitbox.apache.org/repos/asf?p=zookeeper.git;a=tree;f=zookeeper-client/zookeeper-client-c;h=c72b57355c977366edfe11304067ff35f5cf215d;hb=HEAD >> >> > > > > > >> >> > > > > instead of the C++ library. >> >> > > > > As for the Consul client, the only meaningful option is to opt >> >> out of >> >> > > > using >> >> > > > > ppconsul and operate through libcurl's multi interface. >> >> > > > > >> >> > > > > At this point implementing asynchronous interfaces will require >> >> > > rewriting >> >> > > > > liboffkv from the ground up. I can allocate the budget for >> doing >> >> > this, >> >> > > > as I >> >> > > > > have done to date. However, it would be good to have some more >> >> > > > > back-and-forth before reengaging. >> >> > > > > >> >> > > > > Design Doc: >> >> > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://docs.google.com/document/d/1NOfyt7NzpMxxatdFs3f9ixKUS81DHHDVEKBbtVfVi_0 >> >> > > > > [feel free to add it to >> >> > > > > http://mesos.apache.org/documentation/latest/design-docs/] >> >> > > > > >> >> > > > > Thanks, >> >> > > > > >> >> > > > > *SAMUEL MARKS* >> >> > > > > Sydney Medical School | Westmead Institute for Medical >> Research | >> >> > > > > https://linkedin.com/in/samuelmarks >> >> > > > > Director | Sydney Scientific Foundation Ltd < >> >> > > > https://sydneyscientific.org> >> >> > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> https://offscale.io> >> >> > > > > >> >> > > > > PS: Damien - not against contributing to FoundationDB, but >> >> priorities >> >> > > are >> >> > > > > Mesos and the Mesos ecosystem, followed by Kuberentes and its >> >> > > ecosystem. >> >> > > > > >> >> > > > > On Tue, Apr 21, 2020 at 3:19 AM Benjamin Mahler < >> >> bmah...@apache.org> >> >> > > > > wrote: >> >> > > > > >> >> > > > > > Samuel: One more thing I forgot to mention, we would prefer >> to >> >> use >> >> > an >> >> > > > > > asynchronous client interface rather than a synchronous one. >> Is >> >> > that >> >> > > > > > something you have considered? >> >> > > > > > >> >> > > > > > On Fri, Apr 17, 2020 at 6:11 PM Vinod Kone < >> >> vinodk...@apache.org> >> >> > > > wrote: >> >> > > > > > >> >> > > > > > > Hi Samuel, >> >> > > > > > > >> >> > > > > > > Thanks for showing interest in contributing to the project. >> >> > Having >> >> > > > > > > optionality between ZooKeeper and Etcd would be great for >> the >> >> > > project >> >> > > > > and >> >> > > > > > > something that has been brought up a few times before, as >> you >> >> > > noted. >> >> > > > > > > >> >> > > > > > > I echo everything that BenM said. As part of the design it >> >> would >> >> > be >> >> > > > > great >> >> > > > > > > to see the migration path for users currently using Mesos >> with >> >> > > > > ZooKeeper >> >> > > > > > to >> >> > > > > > > Etcd. Ideally, the migration can happen without much user >> >> > > > intervention. >> >> > > > > > > >> >> > > > > > > Additionally, from our past experience, efforts like these >> are >> >> > more >> >> > > > > > > successful if the people writing the code have experience >> with >> >> > how >> >> > > > > things >> >> > > > > > > work in Mesos code base. So I would recommend starting >> small, >> >> > maybe >> >> > > > > have >> >> > > > > > a >> >> > > > > > > few engineers work on a couple "newbie" tickets and do some >> >> small >> >> > > > > > projects >> >> > > > > > > and have those committed to the project. That gives the >> >> > committers >> >> > > > some >> >> > > > > > > level of confidence about quality of the code and be more >> >> open to >> >> > > > > bigger >> >> > > > > > > changes like etcd integration. It would also help >> contributors >> >> > get >> >> > > a >> >> > > > > > better >> >> > > > > > > feeling for the lay of the land and see if they are truly >> >> > > interested >> >> > > > in >> >> > > > > > > maintaining this piece of integration for the long haul. >> This >> >> is >> >> > a >> >> > > > bit >> >> > > > > > of a >> >> > > > > > > longer path but I think it would be more a fruitful one. >> >> > > > > > > >> >> > > > > > > Looking forward to seeing new contributions to Mesos >> including >> >> > the >> >> > > > > above >> >> > > > > > > design! >> >> > > > > > > >> >> > > > > > > Thanks, >> >> > > > > > > >> >> > > > > > > On Fri, Apr 17, 2020 at 4:52 PM Samuel Marks < >> >> sam...@offscale.io >> >> > > >> >> > > > > wrote: >> >> > > > > > > >> >> > > > > > > > Happy to build a design doc, >> >> > > > > > > > >> >> > > > > > > > To answer your question on what Offscale.io is, it's my >> >> > software >> >> > > > and >> >> > > > > > > > biomedical engineering consultancy. Currently it's still >> >> rather >> >> > > > > small, >> >> > > > > > > with >> >> > > > > > > > only 8 engineers, but I'm expecting & preparing to grow >> >> > rapidly. >> >> > > > > > > > >> >> > > > > > > > My philosophy is always open-source and patent-free, so >> >> that's >> >> > > what >> >> > > > > my >> >> > > > > > > > consultancy—and for that matter, the charitable research >> >> that I >> >> > > > fund >> >> > > > > > > > through it <https://sydneyscientific.org>—follows. >> >> > > > > > > > >> >> > > > > > > > The goal of everything we create is: interoperable >> >> > > (cross-platform, >> >> > > > > > > > cross-technology, cross-language, multi-cloud); >> open-source >> >> > > > > (Apache-2.0 >> >> > > > > > > OR >> >> > > > > > > > MIT); with a view towards scaling: >> >> > > > > > > > >> >> > > > > > > > - teams; >> >> > > > > > > > - software-development <https://compilers.com.au>; >> >> > > > > > > > - infrastructure [this proposed Mesos contribution + >> our >> >> > > DevOps >> >> > > > > > > > tooling]; >> >> > > > > > > > - [in the charity's case] facilitating very >> large-scale >> >> > > medical >> >> > > > > > > > diagnostic screening. >> >> > > > > > > > >> >> > > > > > > > Technologies like Mesos we expect to both optimise >> resource >> >> > > > > > > > allocation—reducing costs and increasing data >> locality—and >> >> > award >> >> > > us >> >> > > > > > > > 'bragging rights' with which we can gain clients that are >> >> > already >> >> > > > > using >> >> > > > > > > > Mesos (which, from my experience, is always big >> corporates… >> >> > > though >> >> > > > > > > > hopefully contributions like these will make it >> attractive >> >> to >> >> > > small >> >> > > > > > > > companies also). >> >> > > > > > > > >> >> > > > > > > > So no, we're not going anywhere, and are planning to >> >> maintain >> >> > > this >> >> > > > > > > library >> >> > > > > > > > into the future >> >> > > > > > > > >> >> > > > > > > > PS: Once accepted by Mesos, we'll be making similar >> >> > contributions >> >> > > > to >> >> > > > > > > other >> >> > > > > > > > Mesos ecosystem projects like Chronos < >> >> > > > > https://mesos.github.io/chronos >> >> > > > > > >, >> >> > > > > > > > Marathon <https://github.com/mesosphere/marathon>, and >> >> Aurora >> >> > > > > > > > <https://github.com/aurora-scheduler/aurora> as well as >> to >> >> > > > unrelated >> >> > > > > > > > projects (e.g., removing etcd as a hard-dependency from >> >> > > Kubernetes >> >> > > > > > > > <https://kubernetes.io>… enabling them to choose between >> >> > > > ZooKeeper, >> >> > > > > > > etcd, >> >> > > > > > > > and Consul). >> >> > > > > > > > >> >> > > > > > > > Thanks for your continual feedback, >> >> > > > > > > > >> >> > > > > > > > *SAMUEL MARKS* >> >> > > > > > > > Sydney Medical School | Westmead Institute for Medical >> >> > Research | >> >> > > > > > > > https://linkedin.com/in/samuelmarks >> >> > > > > > > > Director | Sydney Scientific Foundation Ltd < >> >> > > > > > > https://sydneyscientific.org> >> >> > > > > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> >> > https://offscale.io> >> >> > > > > > > > >> >> > > > > > > > >> >> > > > > > > > On Sat, Apr 18, 2020 at 6:58 AM Benjamin Mahler < >> >> > > > bmah...@apache.org> >> >> > > > > > > > wrote: >> >> > > > > > > > >> >> > > > > > > > > Oh ok, could you tell us a little more about how you're >> >> using >> >> > > > > Mesos? >> >> > > > > > > And >> >> > > > > > > > > what offscale.io is? >> >> > > > > > > > > >> >> > > > > > > > > Strictly speaking, we don't really need packaging and >> >> > releases >> >> > > as >> >> > > > > we >> >> > > > > > > can >> >> > > > > > > > > bundle the dependency in our repo and that's what we do >> >> for >> >> > > many >> >> > > > of >> >> > > > > > our >> >> > > > > > > > > dependencies. >> >> > > > > > > > > To me, the most important thing is the commitment to >> >> maintain >> >> > > the >> >> > > > > > > library >> >> > > > > > > > > and address issues that come up. >> >> > > > > > > > > I also would lean more towards a run-time flag rather >> >> than a >> >> > > > build >> >> > > > > > > level >> >> > > > > > > > > flag, if possible. >> >> > > > > > > > > >> >> > > > > > > > > I think the best place to start would be to put >> together a >> >> > > design >> >> > > > > > doc. >> >> > > > > > > > The >> >> > > > > > > > > act of writing that will force the author to think >> through >> >> > the >> >> > > > > > details >> >> > > > > > > > (and >> >> > > > > > > > > there are a lot of them!), and we'll then get a chance >> to >> >> > give >> >> > > > > > > feedback. >> >> > > > > > > > > You can look through the mailing list for past >> examples of >> >> > > design >> >> > > > > > docs >> >> > > > > > > > (in >> >> > > > > > > > > terms of which sections to include, etc). >> >> > > > > > > > > >> >> > > > > > > > > How does that sound? >> >> > > > > > > > > >> >> > > > > > > > > On Tue, Apr 14, 2020 at 8:44 PM Samuel Marks < >> >> > > sam...@offscale.io >> >> > > > > >> >> > > > > > > wrote: >> >> > > > > > > > > >> >> > > > > > > > > > Dear Benjamin Mahler [and *Developers mailing-list >> for >> >> > Apache >> >> > > > > > > Mesos*], >> >> > > > > > > > > > >> >> > > > > > > > > > Thanks for responding so quickly. >> >> > > > > > > > > > >> >> > > > > > > > > > Actually this entire project I invested—time & money, >> >> > > > including a >> >> > > > > > > > > > development team—explicitly in order to contribute >> this >> >> to >> >> > > > Apache >> >> > > > > > > > Mesos. >> >> > > > > > > > > So >> >> > > > > > > > > > no releases yet, because I wanted to ensure it was >> up to >> >> > the >> >> > > > > > > > > specification >> >> > > > > > > > > > requirements referenced in dev@mesos.apache.org >> before >> >> > > > > proceeding >> >> > > > > > > with >> >> > > > > > > > > > packaging and releases. >> >> > > > > > > > > > >> >> > > > > > > > > > Tests have been setup in Travis CI for Linux (Ubuntu >> >> 18.04) >> >> > > and >> >> > > > > > > macOS, >> >> > > > > > > > > > happy to set them up elsewhere also. There are also >> some >> >> > > > Windows >> >> > > > > > > builds >> >> > > > > > > > > > that need a bit of tweaking, then they will be pushed >> >> into >> >> > CI >> >> > > > > also. >> >> > > > > > > We >> >> > > > > > > > > are >> >> > > > > > > > > > just starting to do some work on reducing build & >> test >> >> > times. >> >> > > > > > > > > > >> >> > > > > > > > > > Would be great to build a checklist of things you >> want >> >> to >> >> > see >> >> > > > > > before >> >> > > > > > > we >> >> > > > > > > > > > send the PR, e.g., >> >> > > > > > > > > > >> >> > > > > > > > > > - ☐ hosted docs; >> >> > > > > > > > > > - ☐ CI/CD—including packaging—for Windows, Linux, >> and >> >> > > macOS; >> >> > > > > > > > > > - ☐ releases on GitHub; >> >> > > > > > > > > > - ☐ consistent session and auth interface >> >> > > > > > > > > > - ☐ different tests [can you expand here?] >> >> > > > > > > > > > >> >> > > > > > > > > > This is just an example checklist, would be best if >> you >> >> and >> >> > > > > others >> >> > > > > > > can >> >> > > > > > > > > > flesh it out, so when we do send the PR it's in an >> >> > > immediately >> >> > > > > > > mergable >> >> > > > > > > > > > state. >> >> > > > > > > > > > >> >> > > > > > > > > > BTW: Originally had a debate with my team about >> whether >> >> to >> >> > > > send a >> >> > > > > > PR >> >> > > > > > > > out >> >> > > > > > > > > of >> >> > > > > > > > > > the blue—like Microsoft famously did for Node.js >> >> > > > > > > > > > <https://github.com/nodejs/node/pull/4765>—or start >> an >> >> > > *offer >> >> > > > > > > thread* >> >> > > > > > > > on >> >> > > > > > > > > > the developers mailing-list. >> >> > > > > > > > > > >> >> > > > > > > > > > Looking forward to contributing 🦀 >> >> > > > > > > > > > >> >> > > > > > > > > > *SAMUEL MARKS* >> >> > > > > > > > > > Sydney Medical School | Westmead Institute for >> Medical >> >> > > > Research | >> >> > > > > > > > > > https://linkedin.com/in/samuelmarks >> >> > > > > > > > > > Director | Sydney Scientific Foundation Ltd < >> >> > > > > > > > > https://sydneyscientific.org> >> >> > > > > > > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> >> > > > https://offscale.io> >> >> > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > > On Wed, Apr 15, 2020 at 2:38 AM Benjamin Mahler < >> >> > > > > > bmah...@apache.org> >> >> > > > > > > > > > wrote: >> >> > > > > > > > > > >> >> > > > > > > > > > > Thanks for reaching out, a well maintained and well >> >> > written >> >> > > > > > wrapper >> >> > > > > > > > > > > interface to the three backends would certainly >> make >> >> this >> >> > > > > easier >> >> > > > > > > for >> >> > > > > > > > us >> >> > > > > > > > > > vs >> >> > > > > > > > > > > implementing such an interface ourselves. >> >> > > > > > > > > > > >> >> > > > > > > > > > > Is this the client interface? >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://github.com/offscale/liboffkv/blob/d31181a1e74c5faa0b7f5d7001879640b4d9f111/liboffkv/client.hpp#L115-L142 >> >> > > > > > > > > > > >> >> > > > > > > > > > > At a quick glance, three ZK things that we rely on >> but >> >> > seem >> >> > > > to >> >> > > > > be >> >> > > > > > > > > absent >> >> > > > > > > > > > > from the common interface is the ZK session, >> >> > > authentication, >> >> > > > > and >> >> > > > > > > > > > > authorization. How will these be provided via the >> >> common >> >> > > > > > interface? >> >> > > > > > > > > > > >> >> > > > > > > > > > > Here is our ZK interface wrapper if you want to see >> >> what >> >> > > > kinds >> >> > > > > of >> >> > > > > > > > > things >> >> > > > > > > > > > we >> >> > > > > > > > > > > use: >> >> > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> https://github.com/apache/mesos/blob/1.9.0/include/mesos/zookeeper/zookeeper.hpp#L72-L339 >> >> > > > > > > > > > > >> >> > > > > > > > > > > The project has 0 releases and 0 issues, what kind >> of >> >> > usage >> >> > > > has >> >> > > > > > it >> >> > > > > > > > > seen? >> >> > > > > > > > > > > Has there been any testing yet? Would Offscale.io >> be >> >> > doing >> >> > > > some >> >> > > > > > of >> >> > > > > > > > the >> >> > > > > > > > > > > testing? >> >> > > > > > > > > > > >> >> > > > > > > > > > > On Mon, Apr 13, 2020 at 7:54 PM Samuel Marks < >> >> > > > > sam...@offscale.io >> >> > > > > > > >> >> > > > > > > > > wrote: >> >> > > > > > > > > > > >> >> > > > > > > > > > > > Apache ZooKeeper <https://zookeeper.apache.org> >> is >> >> a >> >> > > large >> >> > > > > > > > > dependency. >> >> > > > > > > > > > > > Enabling developers and operations to use etcd < >> >> > > > > > https://etcd.io >> >> > > > > > > >, >> >> > > > > > > > > > Consul >> >> > > > > > > > > > > > <https://consul.io>, or ZooKeeper should reduce >> >> > resource >> >> > > > > > > > utilisation >> >> > > > > > > > > > and >> >> > > > > > > > > > > > enable new use cases. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > There have already been a number of suggestions >> to >> >> get >> >> > > rid >> >> > > > of >> >> > > > > > > hard >> >> > > > > > > > > > > > dependency on ZooKeeper. For example, see: >> >> MESOS-1806 >> >> > > > > > > > > > > > < >> https://issues.apache.org/jira/browse/MESOS-1806>, >> >> > > > > MESOS-3574 >> >> > > > > > > > > > > > < >> https://issues.apache.org/jira/browse/MESOS-3574>, >> >> > > > > MESOS-3797 >> >> > > > > > > > > > > > < >> https://issues.apache.org/jira/browse/MESOS-3797>, >> >> > > > > MESOS-5828 >> >> > > > > > > > > > > > < >> https://issues.apache.org/jira/browse/MESOS-5828>, >> >> > > > > MESOS-5829 >> >> > > > > > > > > > > > < >> https://issues.apache.org/jira/browse/MESOS-5829>. >> >> > > > However, >> >> > > > > > > there >> >> > > > > > > > > are >> >> > > > > > > > > > > > difficulties in supporting a few implementations >> for >> >> > > > > different >> >> > > > > > > > > services >> >> > > > > > > > > > > > with quite distinct data models. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > A few months ago offscale.io invested in a >> >> solution to >> >> > > > this >> >> > > > > > > > problem >> >> > > > > > > > > - >> >> > > > > > > > > > > > liboffkv <https://github.com/offscale/liboffkv> >> – a >> >> > > *C++* >> >> > > > > > > library >> >> > > > > > > > > > which >> >> > > > > > > > > > > > provides a *uniform interface over ZooKeeper, >> >> Consul KV >> >> > > and >> >> > > > > > > etcd*. >> >> > > > > > > > It >> >> > > > > > > > > > > > abstracts common features of these services into >> its >> >> > own >> >> > > > data >> >> > > > > > > model >> >> > > > > > > > > > which >> >> > > > > > > > > > > > is very similar to ZooKeeper’s one. Careful >> >> attention >> >> > was >> >> > > > > paid >> >> > > > > > to >> >> > > > > > > > > keep >> >> > > > > > > > > > > > methods both efficient and consistent. It is >> >> > > > cross-platform, >> >> > > > > > > > > > > > open-source (*Apache-2.0 >> >> > > > > > > > > > > > OR MIT*), and is written in C++, with vcpkg >> >> packaging, >> >> > *C >> >> > > > > > library >> >> > > > > > > > > > output >> >> > > > > > > > > > > > < >> >> > > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > >> >> > > > > >> >> > > >> >> >> https://github.com/offscale/liboffkv/blob/d3d549e/CMakeLists.txt#L29-L35 >> >> > > > > > > > > > > > >*, >> >> > > > > > > > > > > > and additional interfaces in *Go < >> >> > > > > > > > > https://github.com/offscale?q=goffkv >> >> > > > > > > > > > > >*, >> >> > > > > > > > > > > > *Java >> >> > > > > > > > > > > > <https://github.com/offscale/liboffkv-java>*, >> and >> >> > *Rust >> >> > > > > > > > > > > > <https://github.com/offscale/rsoffkv>*. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > Offscale.io proposes to replace all ZooKeeper >> >> usages in >> >> > > > Mesos >> >> > > > > > > with >> >> > > > > > > > > > usages >> >> > > > > > > > > > > > of liboffkv. Since all interactions which require >> >> > > ZooKeeper >> >> > > > > in >> >> > > > > > > > Mesos >> >> > > > > > > > > > are >> >> > > > > > > > > > > > conducted through the class Group (and >> GroupProcess) >> >> > > with a >> >> > > > > > clear >> >> > > > > > > > > > > interface >> >> > > > > > > > > > > > the obvious way to introduce changes is to >> provide >> >> > > another >> >> > > > > > > > > > implementation >> >> > > > > > > > > > > > of the class which uses liboffkv instead of >> >> ZooKeeper. >> >> > In >> >> > > > > this >> >> > > > > > > case >> >> > > > > > > > > the >> >> > > > > > > > > > > > original implementation may be left unchanged in >> the >> >> > > > codebase >> >> > > > > > and >> >> > > > > > > > > build >> >> > > > > > > > > > > > flags to select from ZK-only and liboffkv >> variants >> >> may >> >> > be >> >> > > > > > > > introduced. >> >> > > > > > > > > > > Once >> >> > > > > > > > > > > > the community is confident, you can decide to >> remove >> >> > the >> >> > > > > > ZK-only >> >> > > > > > > > > > option, >> >> > > > > > > > > > > > and instead only support liboffkv [which >> internally >> >> has >> >> > > > build >> >> > > > > > > flags >> >> > > > > > > > > for >> >> > > > > > > > > > > > each service]. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > Removing the hard dependency on ZooKeeper will >> >> simplify >> >> > > > local >> >> > > > > > > > > > deployment >> >> > > > > > > > > > > > for testing purposes as well as enable using >> Mesos >> >> in >> >> > > > > clusters >> >> > > > > > > > > without >> >> > > > > > > > > > > > ZooKeeper, e.g. where etcd or Consul is used for >> >> > > > > coordination. >> >> > > > > > We >> >> > > > > > > > > > expect >> >> > > > > > > > > > > > this to greatly reduce the amount of >> >> resource—network, >> >> > > CPU, >> >> > > > > > disk, >> >> > > > > > > > > > > > memory—usage in a datacenter environment. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > If the community accepts the initiative, we will >> >> > > integrate >> >> > > > > > > liboffkv >> >> > > > > > > > > > into >> >> > > > > > > > > > > > Mesos. We are also ready to develop the library >> and >> >> > > > consider >> >> > > > > > any >> >> > > > > > > > > > > suggested >> >> > > > > > > > > > > > improvements. >> >> > > > > > > > > > > > *SAMUEL MARKS* >> >> > > > > > > > > > > > Sydney Medical School | Westmead Institute for >> >> Medical >> >> > > > > > Research | >> >> > > > > > > > > > > > https://linkedin.com/in/samuelmarks >> >> > > > > > > > > > > > Director | Sydney Scientific Foundation Ltd < >> >> > > > > > > > > > > https://sydneyscientific.org> >> >> > > > > > > > > > > > | Offscale.io of Sydney Scientific Pty Ltd < >> >> > > > > > https://offscale.io> >> >> > > > > > > > > > > > *SYDNEY SCIENTIFIC FOUNDATION and THE UNIVERSITY >> OF >> >> > > SYDNEY* >> >> > > > > > > > > > > > >> >> > > > > > > > > > > > PS: We will be offering similar contributions to >> >> > Chronos >> >> > > > > > > > > > > > <https://mesos.github.io/chronos>, Marathon >> >> > > > > > > > > > > > <https://github.com/mesosphere/marathon>, Aurora >> >> > > > > > > > > > > > <https://github.com/aurora-scheduler/aurora>, >> and >> >> > > related >> >> > > > > > > > projects. >> >> > > > > > > > > > > > >> >> > > > > > > > > > > >> >> > > > > > > > > > >> >> > > > > > > > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> > >> >