> ; I do trust when Jan says "it looks like we can pull this > off reasonably easy enough by taking on a few growing pains”
I’m honoured :D — so far my commitment goes as far as: it is totally worth discussing the proposal to a point where all open questions have an answer. Whether those answers are ones we as a project like, or think are feasible to do given our constraints, is a separate discussion. I’m cautiously optimistic, but that’s about it. Best Jan — > On 24. Jan 2019, at 12:33, Michael Fair <mich...@daclubhouse.net> wrote: > > Wow, I think I get it, mostly; I still haven't read the FDB docs yet, but I > grok the replies to Will's and my email, and it sounds like FoundationDB > has done some really good underlying work where CouchDB could, in a sense, > become an advocate project for FDB's utility. > > TLDR; +1 from me. :-) > Is this actually worthy of a 3.0 moniker; it seems like it could be > (breaking changes and dropping 1.X compatibility)? > > > Some general, higher level, thoughts (that probably mimic what you guys > have already been thinking): > 1) I believe CouchDB, the software project, needs a growth path for adding > features somewhat organically. I haven't felt this has been the case > historically when it comes to the wire and data storage aspects of the > project. There's been a few ideas I've wanted to experiment with in the > replication protocol. > > For example: > * auto-resolving/merging conflict branches when we can tell/detect that the > heads of two conflicting branches contain an identical document; > * adding some version of a binary JSON encoding to reduce network > utilization; > * creating some kind of a JSON diff ability to transmit changes between > document revisions (that is also binary-able); > * experimenting with encrypted and private data stores to create a > decentralized vault where secret data could be shared with other parties > via CouchDB without revealing the secret data while in transit; > * "object type" based HTML template forms that could be directly "filled > out" by documents / "modified" by browsers > > Overall it sounds to me like the FoundationDB changes/advantages Jan and > Robert described so far collectively point in a direction that would, > generally speaking, make my life easier at approaching at least some of > these ideas. > > > 2) I like the idea of "removing code" from Couch where it makes sense like > this. The Apache CouchDB project to me has always, in a sense, been more > representative of the replication wire protocol and replication semantics > than the erlang software project. I really enjoy the idea that other > projects can incorporate their own "Couch Compatible Replication Layer" and > use the CouchDB software project as their de facto test software. > "Outsourcing" the KV Store work to a project like FDB where there are other > people who enjoy focusing on that specific aspect of the problem leaves the > Couch folks free to focus on higher level features; which I believe is a > wonderful thing and exactly the right direction the whole project should > generally be going. I think of Couch as a very "end user facing" project, > and as such, I think it benefits more than it risks (and I understand the > risks creating outside party dependencies), by building this kind of > outsourced dependency to a project that demonstrates; (1) technical > competence, (2) a fair bit of maturity, (3) decent docs to help ourselves > navigate "their world", and (4) a willingness to be responsive to CouchDB > project requests/feedback/contributions. It sounds like the FDB project > scores well on all four of those points. Having community members with > feet/experience in both projects is a huge help/bonus. > > > 3) This also seems to enable some ideas that I've long wanted to try but > never thought I could do because of CouchDB's document storage design; > index/explore the db using the Neo4j graph database and syncing docs using > the IPLD semantics within the IPFS project (very personal to me I know, but > still, it's nice to see the ideas look more promising to me). I'm > personally a really big fan of the "multiple master copies at distributed > locations" aspect to CouchDB; more so than the "single > distributed/sharded/parallel multiserver database" aspect. I understand > there are more immediate, lucrative applications to the local multiserver, > larger database aspect; so I'm excited to see that work done too. It's > simply not the aspect of the project that really catches my interests. > > > > Thanks very much Jan/Robert for hearing what I had to say and giving great > and meaningful replies! > > Jan, I really appreciate you commenting that you understood my concerns > about taking seriously the need to really incorporate the technology into > the DNA of the community and that's what you expect to see successfully > happen; and Robert, likewise, for adding that the existing FDB community > could very well be interested in Couch as a "front end test project" to > give practical application and meaning to some of their work. > > While I'm not going to go so far as say I can personally vouch for the > proposals success; I do trust when Jan says "it looks like we can pull this > off reasonably easy enough by taking on a few growing pains" that it's > true; and coupled with the clear amount of behind-the-scenes forethought > that went into it; I really like it. > > > Thanks! > Mike > > > On Thu, Jan 24, 2019 at 2:20 AM Robert Samuel Newson <rnew...@apache.org> > wrote: > >> Hi, >> >> Thank you for the in-depth response, that’s exactly what the PMC is >> looking for. >> >> You are comprehending the nature and magnitude of the change correctly >> here, where you suggest we could “just” write a new CouchDB Layer on top of >> FoundationDB and achieve a similar effect. However, the nature of software >> and software development really speaks against doing it that way, in my >> opinion. In 2.0 we introduced an abstraction between the HTTP processing >> layer and the lower plumbing of b-trees and file I/O with the “fabric” >> application. This was essential to introduce clustering but it was a >> significant architectural improvement in its own right. By reimplementing >> below that line we can be more confident that we have preserved all the >> necessary parts of the CouchDB API and experience. Additionally, separate >> applications like the replicator and job scheduler can remain as they are. >> A lot of the existing code will remain as-is, or have minor changes or >> cleanup (the “local” mode for replication, unreachable since 2.0, can >> finally be excised, for example). >> >> To your other point, I remember the difficulty I first had when looking at >> CouchDB. It’s in Erlang, which I’d not used before, and there is a lot of >> subtle and tricky code at the lower tiers (see couch_key_tree.erl or >> couch_btree.erl). By using FoundationDB for that instead I hope we >> _increase_ the comprehensibility of CouchDB, as what remains will be its >> essential nature and not the important but ancillary plumbing below. The >> increased public development activity on CouchDB, the size of the ambition >> here, and some cross-pollinating interest from those who know or are >> interested in FoundationDB should, I hope, bring more active developers of >> all levels of experience and interest to our project. >> >> B. >> >>> On 23 Jan 2019, at 23:27, Michael Fair <mich...@daclubhouse.net> wrote: >>> >>> As someone who isn't as directly involved in the release-to-release >>> development, would a move like this make it easier or harder for >> new/casual >>> community members to get up to speed/understand what's going on? >>> >>> As projects grow and mature, the introductory learning curve tends to get >>> steeper, making it harder for people who didn't "grow up with the >> project" >>> to grok the project as a whole thing. Not complaining, just identifying. >>> >>> Is this proposal suggesting something more akin to a storage layer >>> separation (making it somewhat easier to identify the separate component >>> layers and experiment with different backends) or more like a storage >>> technology change (where any experimenter would first have to understand >>> how FDB semantics are different from File I/O semantics)? >>> >>> All in all it sounds like a promising proposal. >>> My first thought was something like "Hmm, is this different than simply >>> adding a 'Couch Replication Protocol' module to FoundationDB? Probably, >> or >>> they wouldn't be proposing it this way" >>> >>> Followed quickly by, "Okay, looks like I'll likely need to start learning >>> FoundationDB now too if I really want to understand CouchDB's >>> capabilities. I've not really heard much/looked at it before..." >>> >>> I don't think a new learning curve should dissuade people from adopting >> it, >>> but as I haven't looked at the educational materials available, I can't >>> speak to the level of "ownership" the general community would be able to >>> keep. >>> >>> My experience is, generally speaking, people simply avoid aspects of a >>> project they don't feel competent in. Leaving that work to those with >>> stronger opinions/convictions/interest. And that the easier it is to >>> independently "get up to speed" on that aspect of the project (reading a >>> blog(s)/watching a video(s)/tracing code) the more likely an interested >>> party is to contribute there. >>> >>> It'd be great to find out that a consequence of this move makes it easier >>> for interested people, still unfamiliar with CouchDB's internals, to get >>> more involved because there were some great and easily accessible >> teaching >>> materials... >>> >>> This concept obviously isn't unique to this FDB proposal; nor is it >>> advocating for or against; I guess it's just expressing a hope that the >>> impact is made to also help those who would like to get started >>> contributing to CouchDB in meaningful ways instead of them getting a new >>> and more complicated third party tech dependency to go learn as well. >>> >>> Mike >>> >>> PS While I assume there's likely very clear answers, does this differ >>> significantly from the idea of giving FoundationDB a Couch compatible web >>> API interface? Like instead of making FoundationDB "the storage backend" >>> for Couch, why not add a Couch compatible web interface front end to >>> FoundationDB? Is there a lot of useful Couch code in between those two >>> things? >>> >>> >>> On Wed, Jan 23, 2019 at 12:20 PM Joan Touzet <woh...@apache.org> wrote: >>> >>>> Hi everyone, >>>> >>>> As Jan mentions, the PMC has had a couple of weeks to prepare on this. >>>> >>>> As a non-IBMer (though an ex-IBM-er and ex-Cloudant-er), I've had my >>>> Apache PMC hat on the entire time, considering all of the things >>>> that Jan mentions and more. My primary concern has been ensuring that, >>>> should this go forward, what happens occurs in the project's best >>>> interest. >>>> >>>> During the analysis process I came up with 8 serious topics that we >>>> need to sort out: >>>> >>>> * RFC process - how major changes are proposed/designed/accepted, >>>> see new GitHub template for a preview on this >>>> >>>> * Bylaws review - namely, should we insist on +1s from outside >>>> your company for big things? Plus RFC/deprecations. >>>> >>>> * Roadmap - we have a roadmap from ~24 months ago that represented >>>> our goals for CouchDB 2.x and 3.x. What happens to it? >>>> https://s.apache.org/couch2xroadmap >>>> >>>> * Onboarding - better mentoring in The Apache Way and The CouchDB >>>> Way for new members (from IBM and elsewhere) >>>> >>>> * (Re-)Branding - how do we differentiate between "CouchDB Classic" >>>> and "New CouchDB" in a succinct and clear way? >>>> >>>> * FoundationDB - all the non-technical aspects. Review of _their_ >>>> project governance, cross-project pollination, us >>>> learning the core and pros/cons, identifying who >>>> will actually learn that code base, and operational >>>> considerations. Also: keeping this knowledge public >>>> and not just "inside IBM's dev/ops teams". >>>> >>>> * Proj. Mgmt. - Obviously IBM will have a PM involved. We should too. >>>> Reviewing process/procedure and ensuring a smooth >>>> collaboration is critical. IBM doesn't get to just >>>> throw code over the wall at us. Similarly, should we >>>> choose to work on proposed features, or stuff from >>>> the roadmap, we need to be able to cooperate. No >>>> cookie licking allowed![*] >>>> >>>> * Tech deep dives - this will actually be many, many threads I expect, >>>> including everyone's favourite on release mgmt :P >>>> >>>> New threads will be started on these topics by PMC members over the >>>> coming days (but not all at once, so everyone has time to reflect and >>>> respond.) >>>> >>>> My initial take on the proposal: it's GOOD that we're finally >>>> addressing some of the problems that 2.x brought to the table, and if >>>> this is the best way to do so, then so be it. I want to know more >>>> about the technical details, and I want to see a more formal RFC before >>>> voting on it, though. >>>> >>>> -Joan 'And now for something completely different...' Touzet >>>> >>>> [*] http://communitymgt.wikia.com/wiki/Cookie_Licking >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "Jan Lehnardt" <j...@apache.org> >>>>> To: "CouchDB Developers" <dev@couchdb.apache.org> >>>>> Sent: Wednesday, January 23, 2019 8:33:30 AM >>>>> Subject: Re: [DISCUSS] Rebase CouchDB on top of FoundationDB >>>>> >>>>> Hi Bob, >>>>> >>>>> this is all very exciting! >>>>> >>>>> First up, full disclosure, the CouchDB PMC has had about two weeks to >>>>> think about this already, so if any of the following doesn’t sound >>>>> like a knee-jerk reaction, that’s why. >>>>> >>>>> I’m personally tentatively optimistic about this proposal and I’m >>>>> willing to work through all open questions from governance, >>>>> contribution management to the technical bits to see if we as the >>>>> CouchDB project arrive at a point where we are comfortable going >>>>> down this path. >>>>> >>>>> The PMC has already identified a set of discussion areas for this >>>>> dev@ mailing list to go through before any definite decision can be >>>>> made. Separate emails for those discussions are going to be posted >>>>> on this list shortly, so I won’t go into further detail here. >>>>> >>>>> If anyone sees a need for discussion beyond the threads that will >>>>> appear here, please speak up at your earliest convenience. This >>>>> proposal would mean a big step for our project, and we must make >>>>> sure to hear all voices. >>>>> >>>>> Once we’ve gone through all this, the resulting answers to all the >>>>> open questions coming up will end up in a consensus finding process >>>>> on this mailing list, which will signify the final project decision. >>>>> >>>>> * * * >>>>> >>>>> That said, I’d like to highlight one of these topics: IBM/Cloudant’s >>>>> contributions going forward. >>>>> >>>>> Looking at how 2.0 came to be, the contributions were mostly taken on >>>>> good faith (and legal review), and from the trust Cloudant built up >>>>> operating a large number of large instances of clusters of what >>>>> would eventually become CouchDB 2.0. It has clearly paid off for >>>>> CouchDB and our current level of success wouldn’t be without >>>>> IBM/Cloudant. >>>>> >>>>> However, some of the ways we work with the IBM team leave things to >>>>> be desired. Specifically, the Apache CouchDB community is frequently >>>>> not involved in design discussions around new features. Those happen >>>>> inside IBM and we “only” get a PR that then goes through the regular >>>>> review process. Again, this has served us well, but we can do even >>>>> better, so I’d like to take the opportunity of this larger proposal >>>>> to suggest we actually do better. As promised, a more detailed >>>>> thread about this is going to come up, and it’ll be the right place >>>>> to go through the minutiae of this. >>>>> >>>>> With this structural change, I believe we are in a great position to >>>>> work through the details of this proposal and the subsequent design >>>>> and engineering steps. >>>>> >>>>> * * * >>>>> >>>>> Finally, I want to reiterate Bob’s point: while this proposal is >>>>> largely driven by IBM, IBM has no power to unilaterally force the >>>>> CouchDB project to accept this proposal and they have already >>>>> signalled and worked towards making this a mutually beneficial >>>>> endeavour. The CouchDB project has different objectives from IBM and >>>>> it is up to us to come up with a proposal that satisfies all of our >>>>> objectives as well as IBMs, should this motion pass. >>>>> >>>>> Best >>>>> Jan >>>>> — >>>>> >>>>> >>>>>> On 23. Jan 2019, at 11:00, Robert Samuel Newson >>>>>> <rnew...@apache.org> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> CouchDB 2.0 introduced clustering; the ability to scale a single >>>>>> database across multiple nodes, increasing both the maximum size >>>>>> of a database and adding native fault-tolerance. This welcome and >>>>>> considerable step forward was not without its trade-offs. In the >>>>>> years since 2.0 was released, users frequently encounter the >>>>>> following issues as a direct consequence of the 2.0 clustering >>>>>> approach: >>>>>> >>>>>> 1. Conflict revisions can be created on normal concurrent updates >>>>>> issued to a single database, since each replica of a database >>>>>> shard independently chooses whether to accept a given update, and >>>>>> all replicas will eventually propagate updates that any one of >>>>>> them has chosen to accept. >>>>>> 2. Secondary indexes ("views") do not scale the same way as >>>>>> document lookups, as they are sharded by doc id, not emitted view >>>>>> key (thus forcing a consultation of all shard ranges for each >>>>>> query). >>>>>> 3. The changes feed is no longer totally ordered and, worse, could >>>>>> replay earlier changes in the event of a node failure (even a >>>>>> temporary one). >>>>>> >>>>>> The idea is to use FoundationDB as the new CouchDB foundational >>>>>> layer, letting it take care of data storage and placement. An >>>>>> introduction to FoundationDB would take up too much space here so >>>>>> I will summarise it as a highly scalable ordered key-value store >>>>>> with transactional semantics, provides strong consistency, scaling >>>>>> from a single node to many. It is licensed under the ASLv2 but is >>>>>> not an Apache project. >>>>>> >>>>>> By using FoundationDB we can solve all three of the problems listed >>>>>> above and deliver semantics much closer to CouchDB 1.x's behaviour >>>>>> while improving upon the scalability advantages that 2.0 >>>>>> introduced. The essential character of CouchDB would be preserved >>>>>> (MVCC for documents, replication between CouchDB databases) but >>>>>> the underlying plumbing would change significantly. In addition, >>>>>> this new foundation will allow us to add long wished-for features >>>>>> more easily. For example, multi-document transactions become >>>>>> possible, as does efficient field-level reading and writing. A >>>>>> further thought is the ability to update views transactionally >>>>>> with the database update. >>>>>> >>>>>> For those familiar with the CouchDB 2.0 architecture, the proposal >>>>>> is, in effect, to change all the functions in fabric.erl so that >>>>>> they work against a (possibly remote) FoundationDB cluster instead >>>>>> of the current implementation of calling into the original CouchDB >>>>>> 1.x code (couch_btree, couch_file, etc). >>>>>> >>>>>> This is a large change and, for full disclosure, the IBM Cloudant >>>>>> team are proposing it. We have done our due diligence in >>>>>> investigating FoundationDB as well as detailed investigation into >>>>>> how CouchDB semantics would be built on top of FoundationDB. Any >>>>>> and all decisions on that must take place here on the CouchDB >>>>>> developer mailing list, of course, but we are confident that this >>>>>> is feasible. >>>>>> During those investigations we have identified a small number of >>>>>> CouchDB features that we do not yet see a way to do on >>>>>> FoundationDB, the main one being custom (Javascript) reduces. This >>>>>> is a direct consequence of no longer rolling our own persistence >>>>>> layer (couch_btree and friends) and would likely apply to any >>>>>> alternative technology. >>>>>> >>>>>> I think this would be a great advance for CouchDB, preserving what >>>>>> makes CouchDB special but taking advantage of the superbly >>>>>> engineered FoundationDB software at the bottom of the stack. >>>>>> >>>>>> Regards, >>>>>> Robert Newson >>>>> >>>>> -- >>>>> Professional Support for Apache CouchDB: >>>>> https://neighbourhood.ie/couchdb-support/ >>>>> >>>>> >>>> >> >> -- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/