Re: [DISCUSS] A direction from a non-contributor

Chintan Mishra Thu, 11 Jul 2019 03:03:11 -0700

On 11/07/19 2:52 PM, Jonathan Hall wrote:

This looks like a good time for me to chime in.
I'm the author of Kivik (http://kivik.io/), which is CouchDB/PouchDBdriver layer for Go/GopherJS.
Since the beginning, I've had an interest in developing additionaltooling for CouchDB, primarily to ease development and administrationof CouchDB. But some of my goals I think could also lend themselvesto embedded systems, quite easily.
This is not to say anyone should prefer my approach over one writtenin Rust. I'm only offering my own knowledge and perspective.
I've recently had some interest from other Go developers incontributing to the project, so to help explain my vision, I recentlywrote a blog post, which could be relevant here, too:http://kivik.io/vision
My long-term vision for Kivik is to allow different pluggable storagebackends (currently: CouchDB, PouchDB, and a mock driver, and verylimited filesystem- and memory-backed drivers), as well as pluggable"front ends" (currently: whatever custom app you write, as well as atest suite used to test the rest of the stack).
If/when an HTTP server frontend is created, and any backend suitablefor an embedded system (such as SQLite, or even the filesystem)reaches maturity, it would become quite possible to run aself-contained Couch-compatible server on a minimalistic system.Perhaps even an MQTT frontend could be created for Kivik, if there's ademand for such a thing.
Of course, there's always the question of performance and reliability.As my primary goal is _testability_, this naturally leads to adifferent level of reliability and scalability than will probably bewanted by users of embedded systems. But I can't imagine making thissoftware suitable for embedded systems would hurt the usability fortesting. So if anyone is interested in contributing toward Kivik withembedded systems in mind, I would welcome it! I also don't have anycurrent plans to support Mango queries or traditional views for mytesting backends (in-memory or filesystem-backed storage), as theseare not essential for most testing. For these to be useful in anembedded system, someone might want/need to tackle this area.
In the short term, my personal focus is on tooling around developingand managing CouchDB. As mentioned in the blog post, a project I wantto start soon is a command-line tool for interacting with CouchDB. Iwould gladly welcome feedback from anyone who would find such a tooluseful, to help prioritize features, and to design a usable interface.I think this project is a logical next step, even for a possiblefuture goal of targeting embedded systems, as this CLI tool will be agreat way to flesh out and stress-test the filesystem driver.
Jonathan

Does Kivik aim to become UnQL-alternative for any database independentof its design?


--
Chintan

On 7/11/19 10:59 AM, Jan Lehnardt wrote:
Hi Chintan,
On 10. Jul 2019, at 18:25, Chintan Mishra <chin...@rebhu.com> wrote:

On 09/07/19 9:33 PM, Joan Touzet wrote:
Hi Chintan,

Reading through your proposal, I have one main point to make.
At the Apache Software Foundation, the people who lead the projectsarethe people who do the work on them. We use the wrong word"meritocracy"
to explain this principle; a better word would be "do-ocracy."

http://www.apache.org/foundation/how-it-works.html#decision-making
https://incubator.apache.org/guides/participation.html#as_a_developer
   https://communitywiki.org/wiki/DoOcracy

That means that your project can completely proceed on its own if it
wants to; the only thing over which you're not in control is whether
that project gets to call itself CouchDB or not. That decision is
reached by the people who have built CouchDB into what it is today.
I appreciate that you shared these links. I now understand what Ihave to do next.
-----
On that last point, there's a lot that would need to be done foryou to
convince the PMC that your vision is the one, true future of CouchDB.
What you propose is both a significant rewrite, as well asrequiring anentirely new set of skills from the developer base (Rust, MQTT,Kotlin,
Swift).
From Slack conversations, it appears the community has someinclination towards building a Rust based CouchDB some day. As forother technologies those changes are not happening today. I do notpropose to start with all the changes at once. Storage engine is agood place to start.
Since I brought it up in Slack, let me clarify: I do not suggest that
we should move CouchDB to Rust today or any time later.

What I am suggesting is that we should look at the things required to
support your idea of an IoT-capable CouchDB-like thing. My suggestion
is to not change CouchDB, but to make a new CouchDB compatible project.

Devices are only getting smaller, so a lower level language is needed
to ensure performance and good battery use. That leaves C, C++, Go and
Rust.

When I’m looking at what likely people I could excite to contribute to
such a thing, in my filter bubble that is folks getting into Rust or
and the rest of the Rust community.

If it ends up being Go, C or C++, because someone who runs with this
prefers those, I don’t really mind.

* * *

In particular, we should look at a more detailed IoT use-case and how
CouchDB can help.

Correct me if I’m wrong, but this is mostly about devices with sensors
generating measurements over time that should be aggregated into a
cloud service for analysis.
In that world, a hypothetical API for an IoT app using our newRustyCouch
could look like this:

db = RustyCouch.open('file.foo’)

db.save(measurement)

db.push('https://cloud.measurements.com’)

repeat

This is a very small subset of the CouchDB API, but it would cover the
majority of your billion IoT use-cases.

There are a few things to be considered about data persistence and
concurrency control, but in another email, you already mentioned
SQLite, which solves most of those for you already.

db.save() would generate a JSON document with a uuid as _id and
corresponding _rev and an entry in an index that allows us to query,
at a later point: in what order were these docs written, which we are
going to need for db.push()

db.push() then opens that index, checks with the cloud which docs
it already has (as per the standard CouchDB replication protocol)
and then sends all local docs that aren’t on the cloud yet in a
couple of _bulk_docs batches.

Voilá, a low-level, embeddable library that allows you to sync
stuff to CouchDB.

This is a scope that a single developer could make a prototype
of, even in a language that you are just starting out with.

With this in hand then, the next step is to talk to the folks
who build IoT platforms and applications to see if they want to
use something like that.

And once we have this, we can talk about changes to the replication
protocol.

* * *

If you want to take this further, and make a library that also
supports interactive querying, for say native applications on
phones and watches and whatnot, you already have a decent
foundation, but you’ll have a little more work to do.

* * *

But none of this requires changing CouchDB itself, or a 10 year
effort of porting something, while solving all the needs you
have.

* * *

Finally, I’d like to caution against being flippant about the
current project direction with FoundationDB. This is something
the team that has been doing this for over 10 years looked at
“in-depth” and decided it is the right thing to do.

The alternative would be to build a FoundationDB like thing
ourselves, which is a multi-million dollar investment that
I haven’t any one seen commit to at the moment.

In particular, I’m one of the champions for smaller CouchDB
installations in this project, and moving forward is always
a give-and-take. We are not in a position yet to gauge what
the problems are with an FDB-Couch for a single-node instance
but I’m sure going to work hard on making it easy for our
downstream users.

I’m the maintainer of the Mac binaries which are extremely
popular. Any database that can’t be set up with a download,
unzip and double click to start to get a dev environment is
going to have trouble attracting new developers. So I’ll make
sure we can retain this experience as much as possible.

* * *

Let’s be pragmatic and consider incremental change or small
scope side projects to move this forward. Grand visions,
in my experience almost never work out. The only reason I have
trust in an FDB transition is because someone with authority
and budget said “the team that ostensibly built CouchDB 2.x
is going to do this”. That’s the only way it could possible work.

And don’t mistake my RustyCouch suggestion about being
dismissive or sidelining. I’ve wanted something like this since
about 2008, and many people have tried with various attempts,
so my suggestion above is very serious *and* fed by the
experience or all these failed attempts.

CouchDB’s strength is its replication protocol. We didn’t
rewrite CouchDB in JavaScript because we suddenly realised
there are a billion browsers, but PouchDB came along with
a compatible data model and replication engine so that the
two projects complement each other perfectly and anyone on
the CouchDB will tell you that PouchDB is one of the biggest
drivers of CouchDB adoption.

How about we just re-run this strategy for IoT: build a
small thing that is useful for one use-case and make it work,
then make it more complicated to be useful for more use-cases.
At each point, make sure replication with CouchDB works.
That’s a winning strategy. We already know it.

Best
Jan
—
  It is in direct competition with the proposal being worked on
this list for the FoundationDB backend swap. With the addition ofMQTT,
it sounds like the entire replication protocol and methodology would
need to be revisited, as the semantic changes you're proposing would
break existing client replication.
The HTTP replication protocol more or less remains the same in theforeseeable future. A new MQTT replication strategy will be builtupon the existing method. The two will not work in parallel. Eitherone of these will work per database.
Finally, the proposal to push into
the mobile space would directly compete with our sister projectPouchDB,
who have put in tens of thousands of development hours as well.
The community will evolve at some point. And bringing people fromsister project onto CouchDB will allow faster development. Thediagram in the proposal missed a part for Web Browser based CouchDB.This missed part is an interface for JavaScript and CouchDB-WebBrowser. So, we will need some JavaScript developers too. And theycan help improve Fauxton.
  This all
adds up to a much bigger scoped project than CouchDB is today, and I
daresay may be bigger than I think even you realize.
I do realize that I want CouchDB to be in a billion mobile andembedded device by 2025. I understand this is a challenging scale. Ibrought this here because I see how much we need a DB for a "ClusterOf Unreliable Commodity Hardware". I assume proposed path will takesomewhere between 18-21 months to come to fruition for a team of 15people working 40 hours/week.
With my PMC hat on, I have to ask:
* Do you already have developers versed in these skills you canbring to the project (beyond yourself)? Are they ready to commit the 40+hours
   a week each to making it a reality?
No, I do not have a team in place for this.
* Do you have experience in building a distributed system of thisscale,
   using the specific technologies you propose?
I have been reading about distributed systems. I want to take up anOpen Source project which solves replication problem for devicescoming up with emerging technologies.CouchDB is the best fit as italready solves theproblem of replication across remote devices.
* How do you plan to convince other developers of your approach
   specifically?
What got us(you) here, won't get us(you) there! -- Marshall Goldsmith
CouchDB led the way by being years ahead. This is just the samething happening again in a newer market. CouchDB is already great atreplication. What I am proposing is taking this simple-but-powerfulmethodology a step further and building it for planetary scaleuse-cases(idea derived from Lasp-Lang).Here are some ways with whichwe drive more developers, users, and eyes.
* Helping users realize that CouchDB lets them relax while building
   applications for devices with any form factor.
* Reaching out to the developers who have built their own solution for
   replicating stuff from their device of any form factor to CouchDB
* On-boarding developers who will become early adopter and test it out
   on their IoT devices. Thus, proving an unmet market need.
* Promoting offline-first strategy among mobile and embedded
   developers will drive contributors from these communities.
* Documenting comparisons between existing mobile and embedded
solutions which provide replication solutions like Realm, andCouchBase.
* How do you intend to train up our existing developers on the new
   languages and technologies involved?
If people are excited about the future they are building then thisis a smaller problem to tackle. People in this community when and ifthey come to a consensus about the proposal then this can be tackledby 'Each one, teach one' followed by Yamaha Motors. This is a buddysystem where people get new partners to tackle a problem/PR. Theyshare issues, their understanding of the codebase and language, etc.with each other. As buddies rotate everyone gets on the same pageafter a few cycles. I have found 3-pair buddy system works best insoftware.But this may differ based on culture, language, timezone,and availability.
* How do you perceive the advantages and disadvantages of yourapproach
   *specifically* vs. the FDB approach already outlined?
Value addition (Horizontal) > >
----
Proposal (Vertical) \/ \/

Pros


Cons

FoundationDB


* Improving what works for majority of existing users
* Iterates CouchDB to a better form
* Prospect of immediate consistency for ACID transactions



* Losing some small and mid-sized developers
* Fragments community


Polyglot-unification


* Growth by tapping newer prospects
* Reduces fragmentation of user community and codebase
* Reimagines CouchDB as if it was built in 2019



* Tons of work
* Uses RocksDB, overlooks FoundationDB migration
Email with subject "CouchDb Rewrite/Fork" by 'Reddy B.<redd...@live.fr>' has mentioned some other concerns. This proposalintroduces a new story for CouchDB. This proposal would requireusing RocksDB instead of FoundationDB.
-Joan

On 2019-07-09 10:28, Chintan Mishra wrote:
Hello team!!
Years of time and effort help move a product to the heights thatCouchDB
has reached. And as a non-contributor, rather a very new CouchDB
user(1.5 years) who failed to find some relevant emails, I came upwitha version of the future for CouchDB that I thought would help usgrow.But Jan and Robert helped me realize that it takes a village toraise achild(CouchDB). So this is a proposal to find a middle ground fromwherewe are headed and where the market is going next. The proposal Iwrotewas solely driven by what I have read over the years about thegrowth ofthe product and the community. I have attached the file or if youpreferreading in a browser, then clickhere<https://gitlab.com/snippets/1873543>.
It will roughly take 4-5 minutes of your time. A proposeddirection isto start an entirely new project. That is not what I desire. Iwant tojoin the community behind CouchDB not build a new one using it. Mygoal
from this proposal is to generate leverage by creating early mover
advantage and help grow the community.

Thanking you.

--
Chintan Mishra
Rebhu Computing
Founder and CEO

Re: [DISCUSS] A direction from a non-contributor

Reply via email to