Re: CouchDB Next

Nick Vatamaniuc Wed, 28 Sep 2016 15:50:42 -0700

Hi everyone,

Thanks for starting the thread, Jan. Lots of exciting stuff!


> # Replication
>
> This is our flagship feature of course, and there are a few things we can do 
> better.

We have some exciting stuff happening in this area already. At
IBM/Cloudant, Robert Newson, Ben Bastian and I, with help and advice
from Paul Davis, and guidance from Rohit Agarwal, have been working on
some improvements to the replicator.

We have a version of the replicator which has a scheduler at its core.
This enables it to run a much larger number of replication jobs than
the current replicator. The scheduler periodically starts and runs a
subset of jobs so they can all make incremental progress. There is an
http client connection pool which can share connections between
replication jobs. Other new thing is temporary errors and triggered
states are not written back to replication documents, anymore. Writing
intermittent replication states back to replication docs was causing
operational and performance issues. Also, there is a new HTTP API to
monitor the state of replication jobs and documents. And most
importantly, the replication protocol has stayed the same. Besides the
change to the document state reporting bit, it can be a drop-in
replacement for current 2.0 replicator.  This work is in the testing
phase currently, and we'll share more details in the future, but this
thread seemed like a good place to mention it.

Other things I think are exciting:

 * CRDT based document conflict resolution. Wonder if it extends to
having an "auto-merge" set of strategies. CRDT is one, but some
customers might want to specify a document field like ("timestamp")
with documents with higher field values would win. It would mean
losing updates, but I think there are scenarios were customers would
accept losing interment updates (think sensor data) in return of not
having to worry about conflicts.

* Paul's pluggable storage engine. Wonder if there is some combination
of both of those that would make CouchDB more suitable for collecting
time series data (say pick the right engine and right merge strategy).

* Jan's mention of finding a way to apply a compression format to the
whole db. I have seen pretty impressive compression ratios when using
LZMA or something similar on the whole db. There is obviously tension
between random access to documents and using compression across them.
But perhaps there is something to explore there. There has been a
surprising resurgence of compression research lately with things like
Brotli from Google and zstd from Facebook
(http://facebook.github.io/zstd/). zstd has an interesting "training"
mode where it can do a pass over small documents and learn a common
dictionary, and CouchDB already passed over data during compaction,
would that be a good time to train a compression dictionary?

* A pie in the sky idea: even though CouchDB is an eventually
consistent db, is there a place for small islands consistency in it,
perhaps.  I've heard mentioned before using a consistent layer for
configuration. But a Paxos or Raft based consistent db "flavor" would
be lovely for some cases. There has been some interesting new
developments in that area showing that availability doesn't always
have to go down the drain:  https://arxiv.org/abs/1608.06696

Cheers,
-Nick

Re: CouchDB Next

Reply via email to