Re: CouchDB Summit Notes

Jan Lehnardt Wed, 15 Feb 2017 11:30:08 -0800

That’s an even better summary, ignore mine :)

> On 15 Feb 2017, at 20:25, Paul Davis <[email protected]> wrote:
> 
> I think the general consensus is that we should investigate using
> Elixir but there are some concerns on various aspects. For instance,
> newer Elixir requires a minimum of Erlang 18.0. There's also concerns
> about wether/how we mix Elixir into the existing applications. Whether
> or not we start rewriting components in Elixir or only allow new
> applications to use it. So, I'd call the general idea "interested but
> unclear on the best path forward" which I basically take to mean we'll
> see what the community comes up with.
> 
> On Wed, Feb 15, 2017 at 2:14 PM, Ilya Khlopotov <[email protected]> wrote:
>> 
>> Fantastic notes!!!
>> 
>> While reading them I notice an Elixir related section. There is no 
>> conclusion on this unfortunately. It also quite hard to infer the sentiment 
>> from the notes. I would love to get a general idea on:
>> How many people present at the meeting see the value in adopting Elixir and 
>> how many aren't?
>> 
>> BR,
>> ILYA
>> 
>> 
>> Robert Samuel Newson ---2017/02/15 05:38:40 AM---From: Robert Samuel Newson 
>> <[email protected]> To: [email protected]
>> 
>> From: Robert Samuel Newson <[email protected]>
>> To: [email protected]
>> Date: 2017/02/15 05:38 AM
>> Subject: CouchDB Summit Notes
>> 
>> ________________________________
>> 
>> 
>> 
>> Hi,
>> 
>> A group of couchdb folks got together recently for a 3 day session (Feb 10, 
>> 11, 12) to discuss the couchdb roadmap. In attendance;
>> 
>> Russell Branca, Paul Davis, Dale Harvey, Adam Kocoloski, Nolan Lawson, Jan 
>> Lehnardt, Gregor Martynus, Robert Newson, Garren Smith, Joan Touzet.
>> 
>> We took turns taking notes, and I present those notes without editing at the 
>> end of this email for full disclosure. This effort represents the beginning 
>> of the effort to define and execute a new couchdb roadmap and all decisions 
>> will be made on this mailing list and/or in JIRA as per normal ASF rules. It 
>> was enormously helpful to get a cohort together in one space for a few days 
>> to thrash this out.
>> 
>> Friday and Saturday was primarily wide-ranging discussion and 
>> Saturday/Sunday we got into more detailed conversations of the designs of a 
>> few key new features we want to focus on for the next major release of 
>> couchdb. I will summarise those two efforts first and the copious raw notes 
>> will follow.
>> 
>> 1. "hard" deletes
>> 
>> As everyone knows by now, a deleted document in couchdb is preserved 
>> forever. They are typically small, but they do take up space, which makes 
>> many uses of couchdb problematic and effectively excludes certain uses 
>> entirely. The CouchDB attendees feel this should change, and that the new 
>> CouchDB behaviour should be that a delete document eventually occupies no 
>> space at all (post compaction).
>> 
>> The design for this is described in the notes and we whiteboarded it in more 
>> detail, but the essential observation is this; CouchDB is free to entirely 
>> forget a document once all secondary indexes have processed the deletion and 
>> all replications have checkpointed past it. To allow third parties to 
>> inter-replicate,  we will introduce an API to allow anyone to inhibit this 
>> total deletion from occurring. The set of secondary indexes, replication 
>> checkpoints and these as yet unnamed third party markers allows couchdb to 
>> calculate an update sequence below which no deleted document needs to be 
>> preserved. We called this the 'databasement' in our conversations but a more 
>> obvious, but less amusing, name will doubtless occur to us as we proceed.
>> 
>> This is obviously a huge (and, we hope, welcome) shift in couchdb semantics 
>> and we want to get it right. There'll be a more detailed writeup in the 
>> corresponding JIRA ticket(s) soon.
>> 
>> 2. role-based access control
>> 
>> We challenged another long-standing couchdb convention; that access controls 
>> are at the database level only. This leads to awkward workarounds like the 
>> "db per user" pattern which does not scale well. We spent a lot of time on 
>> this topic and believe we have a feasible and efficient design.
>> 
>> In brief, role-based access control will be available as an option (perhaps 
>> only at database creation time, it's TBD if it can be toggled on or off for 
>> existing databases). A document must be marked with the roles that allow 
>> access, and users must be granted matching roles. We'll explicitly support a 
>> richer control than mere matching, it will be possible to require multiple 
>> roles for access. To make this work we will build an additional index for 
>> efficient access control.
>> 
>> There's much more to be said of the amazing discussions we had at the summit 
>> and I'll defer to the following notes for that, feel free to ask questions 
>> in replies to this thread.
>> 
>> Robert Newson
>> Apache CouchDB PMC
>> 
>> 
>> CouchDB Summit Notes
>> 
>> 1. Testing
>> • Shared test environment
>> • PouchDB test suite has value for CouchDB, but isn’t modular
>> • Outcome: Create unified modular pouchdb test framework to use in CouchDB
>> 2. Simpler API
>> • No more read API’s
>> • Dale: Lots of ways of doing read/writes - but no recommended method
>> • Nolan: Differentiate between api endpoints that user’s use and what is 
>> needed for replication, think /db/_replicate hiding all replicator specific 
>> calls
>> • Nolan: Mango covers some APIs - get/all_docs/query
>> • rnewson: Map/reduce is never going to be removed
>> • SQL?
>> • Separate the replication endpoints and bulk endpoints from ‘normal’ 
>> endpoints
>> • rnewson: Moving to HTTP/2 need to reassess which endpoints are necessary
>> • Outcome: Did we have an outcome here?
>> 3. HTTP/2
>> • Jan: Cowboy not the best maintained project
>> • Jan: Need to touch base with Cowboy author first
>> • rnewson: value of Cowboy is rest module is already there
>> • Garren: Elixir stuff is all built on top of Cowboy
>> • Jan: Author of Cowboy is sponsored, maybe IBM can help to influence 
>> direction
>> • rnewson: it’s possible, but CouchDB project shouldn’t have a hard dep on $
>> • Jan: there is no possibility of starting from scratch
>> • Rnewson: we’ve never written middleware between us and mochiweb
>> • Jan: it doesn’t do HTTP/2
>> • Rnewson: we get that for free if we switch to Cowboy
>> • Jan: Cowboy is still in development, still introducing breaking changes
>> • Jan: Will talk to Loic
>> • State of HTTP2
>> • Erlang compat / deprecation
>> • Collaboration model
>> 4. Packaging (detour)
>> • Nolan: managing Erlang versions is really hard today
>> • Rnewson: maybe we should deprecate the whitelist
>> • Nolan: nah better than failing for weird reasons at runtime
>> • Nolan: custom PPA?
>> • Jan/Rnewson: would need lots of resources for that
>> • Rnewson: ideally the bad Erlang version problem should be going away 
>> eventually; the new releases are all good
>> • Jan: can we run tests against pre-release Erlang versions? Need Cloudant 
>> scale traffic though
>> • Rnewson: instead of having rebar config whitelist we could instead have an 
>> Eunit test that tests for the specific versions we know are broken, outright 
>> fails, tells you why. Maybe a blacklist rather than a whitelist. All the 
>> broken versions are in the past
>> • Rnewson: we mostly figured this stuff out in production
>> • Paul: also from reading the Erlang mailing lists, e.g. Basho folks 
>> discovering bugs
>> • Jan: so we could try to Canary this but it’s unlikely to work well
>> • Rnewson: Erlang themselves have improved their practices. From now on we 
>> can say “we support the latest n Erlang releases”
>> • Rnewson: Could ask Ericsson to run CouchDB against their pre-release 
>> Erlang versions.
>> 5. Sub-document operations
>> • Jan: this is a feature request since forever
>> • Jan: what won’t change is that we need to read the whole JSON document 
>> from disk
>> • Rnewson: we do see Cloudant customers wanting to do this
>> • Jan: how does this work in PouchDB? Does json pointer map?
>> • Nolan: this can be done today, slashes or dots don’t matter, but dots are 
>> more intuitive. Angular/Vue does that
>> • Jan: Ember too
>> • Nolan: I wouldn’t worry about this, just do whatever makes sense for Couch 
>> and Pouch can add sugar
>> • Garren: we can design this incrementally
>> • Jan: let’s postpone the bikeshedding
>> • Garren: would this fit with Cowboy?
>> • Rnewson: yeah quite well, the paths into the objects map well
>> • Garren: and this would just be used for getting some of the content from a 
>> doc
>> • Rnewson: you’d still have to do the rev
>> • Rnewson: we could add that today
>> 
>> 6. GraphQL
>> 
>> • Best done as a plugin
>> 
>> 7. CouchDB and IBM/Cloudant Collaboration
>> 
>> • A community manager to help remove roadblocks
>> • Gregor: could be cross-project: Couch, Pouch, Hoodie
>> • Concern around if Cloudant doesn’t work on it no-one will
>> • Jan: weekly news taught me there is a *lot* going on in Couch land
>> • Need more design-level discussions to happen in the open
>> 
>> 8. Move _-fields out of JSON
>> • add future proof top level `_meta` (final name TBD), for future meta data 
>> extensions
>> • Paul: adding any field means we have to think about replication to old 
>> clients because they throw errors for any unknown _ field. But once we have 
>> _meta we’ll be in a better place
>> 
>> 9. VDU in Mango
>> • Yes.
>> 
>> 10. Rebalancing
>> • Jan: this is Couchbase’s killer feature
>> 
>> 11. Proper _changes for views
>> • Jan: the idea is you can listen to a subset of changes to a database
>> • Jan: could be a basis for selective replication
>> • Jan: Benoit wrote this as a fork for a customer
>> • Garren: with HTTP/2 can you stream the changes?
>> • All: yes
>> • Adam: there are so many implementations of this that allow every possible 
>> use case
>> • Adam: can we just say “I want live refreshes?”
>> • Jan: let’s punt on this
>> • Rnewson: we know when we’re updating a view, we could just send that out
>> • Possibly create a “live view” basically like a changes feed but with out 
>> getting the guarantee of getting a full history but rather an update since 
>> you joined.
>> 
>> 12. Redesign security system
>> • Gregor: what initially drew me to Couch was that it had auth built-in, 
>> very nice for building simple apps. As we made progress though we found we 
>> just rewrote auth ourselves and used CouchDB only for DB
>> • Rnewson: hard to converge, no one clear winner
>> • Adam: implications on the Cloudant side as well. Cloudant needs to align 
>> internally with the IBM Cloud Identity & Access Management system. Need to 
>> be careful about chasing a moving target.
>> • Rnewson: Cowboy does give a clean API for this
>> • Nolan: big thing with current auth is that users frequently get stuck on 
>> basic rather than cookie auth
>> • Garren: how to do offline-first auth?
>> • Nolan: no good answer right now
>> • Gregor: frequently users get invalidated towards the backend 
>> asynchronously. Users continue working locally, become un-authenticated. 
>> It’s more of a UI problem
>> • Gregor: ideally users don’t need to login at all to use the app. Can use 
>> it locally w/o login
>> • Jan: How does this impact Pouch?
>> • Nolan: something streamlined would make the most sense, but it’s okay if 
>> it’s not on by default. Users should be able to get up and running quickly, 
>> but also graduate to a production app
>> • Rnewson: we’re also planning to do secure by default
>> • Jan: it should be easily tweakable whenever something new like JWT comes 
>> along
>> • Garren: What does Mongo do?
>> • Jan: No other DB does this
>> • Rnewson: Or nobody uses it
>> • Jan: Like password forget. Doesn’t exist
>> • Adam: no upside to increasing complexity on DB backend
>> • Gregor: It’d be good to have a simple lowest-common denominator 
>> (user/pass) in order to get going. Everything beyond that… e.g. how do I get 
>> a session without a password? In Hoodie we implemented CouchDB’s algo in JS
>> • Rnewson: well we’re not going to do unsecured by default anymore
>> • Adam: most of this will probably not make it into Cloudant’s API
>> • Garren: Given we don’t have 1000s of devs, it does help us not have too 
>> much on our plates
>> • Adam: if there were a plugin to make this easy… in any case it shouldn’t 
>> be the focus of Couch
>> • Nolan : it does play to couch’s strengths - the “http database”
>> • Jan: yeah not trying to diminish but it would be a lot of work… but Hoodie 
>> should not have to do so much work
>> • Adam: agree… could smooth over rough edges
>> • Nolan: maybe should wait for the plugins discussion then
>> • Break time
>> 
>> 13. Mobile-first replication protocol
>> • Jan: When replication protocol was first designed this wasn’t really a 
>> concern
>> • Jan: HTTP/2 fixes many of these issues
>> • Jan: may also need a way to do tombstone-less revisions
>> • Nolan: 80/90% of problems solved by HTTP2
>> • Testing would be great to have a Docker image with a HTTP2 proxy
>> • No improvement would not mean not worth doing.
>> • Revisit
>> • Nolan: primary use case for PouchDB is mobile, poor network conditions. 
>> Currently HTTP 1 algo is very chatty, users complain about it
>> • Nolan: Need to validate with an HTTP/2 wrapper to see improvement.
>> • Rnewson: Doesn’t disprove though. But might prove benefit.
>> 14. Tombstone-less replication / tombstone deletion in database
>> • Rnewson: we see this a lot with Cloudant, often folks don’t want deletions 
>> to replicate. It’s there for a good reason, there’s a massive use case, but  
>> it shouldn’t apply to everyone. There are use cases where people want to 
>> delete data from a database. We’re starting to implement some stuff already 
>> in Cloudant. Would prefer for it to be in CouchDB. We’re implementing a 
>> clustered version of purge, currently only way to do this. Might be driven 
>> by views. It’s hacky. Need a solution where we say “don’t need to sync any 
>> further.”
>> • Gregor: revision history is different from purging
>> • Rnewson: what we’re doing is making purge first-class
>> • Gregor: from our position purging is a major thing. We just don’t do it. 
>> If you want to end the replication and then share… it’s a problem. Purging 
>> is what we need.
>> • Nolan: we don’t implement purge in PouchDB
>> • Rnewson: new one is just a clustered version of old one probably. Needs to 
>> be safe across shards.
>> • Nolan: we don’t implement it because it’s hard. Have to make sure views 
>> don’t show purged data
>> • Rnewson: similar reasons it’s hard across shards
>> • Rnewson: need to let others interact and not impact by purged data
>> • Jan: replication could automatically skip tombstones
>> • Rnewson: should be able to add checkpoints and say ignore deletions before 
>> this
>> • Chewbranca: what about replicated purge log?
>> • Rnewson: that’s what this is
>> • Chewbranca: exposed as a clustered level?
>> • Paul: no. not impossible to add it, but counterintuitive. Kind of out of 
>> scope. Didn’t get into it. Changing the external HTTP replicator frightens 
>> me. Lots extra to do. PouchDB compat etc.
>> 15. Auto conflict resolution
>> • Jan: People don’t like writing conflict algorithms
>> • Paul: Hard when people delete data and then it replicates back. Hoping 
>> purge will help.
>> • Rnewson: question is: how to do such that it doesn’t cause conflicts 
>> between peers. Given same input need same output
>> • Jan: and no loops
>> • Rnewson: yes needs to converge
>> • Jan: CRDTs?
>> • Rnewson: yes
>> • Rnewson: idea is that things in CouchDB would have defined solution to the 
>> conflict problem
>> • Rnewson: could say if you want to do it, conflicts need to be first class
>> • Jan: there’s a paper on this. Read it on the flight. Nice attempt but not 
>> production ready. Unbounded index problem.
>> • Chewbranca: CRDTs aren’t suitable to all data types. Typo in two places in 
>> same doc, CRDTs can’t tell you what to do
>> 16. Conflicts as first-class
>> • Paul: conflicts as first class talks about conflicts as a graph instead of 
>> a tree. Resolution may introduce conflicts. I’d like to be able to fix those.
>> • Gregor: what we’ve seen the most is one revision is a delete… should 
>> always win. Weird when you delete something then it comes back due to a 
>> conflict.
>> • Rnewson: idea of graphs is you say they join back… not just branching trees
>> • Adam: you deleted one branch in server, one in client. Deletion trumps
>> • Adam: it’s like a git merge. Person working on branch can keep working. 
>> Changes from “undeletion, what the hell?”
>> • Chewbranca: need easier way to expose this to clients… conflicts as first 
>> class doesn’t discuss this
>> • Rnewson: I think it’s an unpleasant experience when conflicts are hidden 
>> until they scream. Exposing them is worse. Focus on things that let you 
>> actually resolve the situation once you encounter it.
>> • Chewbranca: we need to be more aggressive. User can use couchdb for a long 
>> time and never know about conflicts. Never have that info exposed.
>> • Jan: what about always including conflicts
>> • Rnewson: nice middle ground. At least you’d notice the problem
>> • Nolan: isn’t this going into Fauxton?
>> • Rnewson: I think it is. But it’s also just a nudge
>> • Gregor: we are ignoring conflicts mostly. But it’s a good feeling that 
>> we’re building something that does everything right so we can handle 
>> conflicts later. Something I’d wish is to make it easier to register 
>> app-specific conflict resolution algos. E.g. deletion always wins.
>> • Rnewson: need to decide whether we just get rid of conflicts. I think 
>> we’re stuck with them
>> • Chewbranca: what if you’re not doing backups
>> • Rnewson: That’s not where we’re going… want to give more capabilities to 
>> handle it. There’s the graph so we can contract the tree, and then some 
>> javascript function to do automatic conflict resolution. Maybe some 
>> built-ins like we have for reduce. With enough docs this could go away.
>> • Chewbranca: it’s also easy to make that conflicts view. Should promote 
>> cron jobs
>> • Rnewson: don’t want to say “our killer feature is multi-master 
>> replication… btw don’t use it”
>> • Chewbranca: I mean make it easy for the user to find it
>> • Jan: will revisit this
>> • Rnewson: the fundamental pieces are right. We replicate around until 
>> something resolves it. We’ve done very little around tooling and support and 
>> visibility.
>> • Chewbranca: Configurable “max conflicts”? 10 conflicts and you’re done?
>> 
>> 17. Selective sync
>> • Jan: selective sync is also big topic
>> • Jan: typical thing is “I want to build Gmail in Couch/Pouch but I can’t”
>> • Jan: something could get archived after n days
>> • Chewbranca: you could accomplish a lot of that with view filters for 
>> replication
>> • Chewbranca: timestamp view to query for a particular month, replicate this 
>> view
>> • Paul: I was thinking of it as a filter more than as a view. As Dale 
>> pointed out we can do it with filters
>> • Chewbranca: replication always walks entire changes feed
>> • Nolan: does that handle “sliding window”?
>> • Jan: yes
>> • Gregor: What about archiving? I always thought we could make this work 
>> using filters. I could say “archive every doc with that property” and those 
>> docs wouldn’t be synchronized, but someone said it doesn’t help because it 
>> still starts from the beginning.
>> • Adam: you also want to start from newest and go back
>> • Nolan: this affects npm replication… old packages replicated first
>> • Adam: we see this with gaming as well. Oldest stuff replicated first
>> • Adam: what we’re realizing is that this particular “new to oldest” 
>> replication is generally useful
>> • Adam: could also stop when we get to a certain age
>> • Rnewson: or mango… as soon as it stops matching, stop replicating
>> • Adam: not saying reading changes backwards is easy… some tricks
>> • Gregor: another thing, let’s say I open my email, first thing I want to 
>> see is an overview, the subject and the first few characters, 
>> meta-information. Usually this would be a view. But I want to sync this view 
>> first and show it while the sync is still going.
>> • Jan: view could be server-side
>> • Jan: could be in a library though, not in core Couch
>> • Chewbranca: you can do this today if you know the update sequence. The 
>> tricky bit is discovering the update seq. Another option is make it easier 
>> to find that seq.
>> • Nolan: can do this today by forking pouchdb-replicate package
>> • Adam: hardest is how to resume
>> • Chewbranca: could do newest to oldest with a limit, only fetch the first 
>> 1000
>> 18. Database archival
>> • Jan: Database archival… ignore this for now
>> • Chewbranca: want to export shards
>> • Joan: lots of people want rolling databases, or create a new one every 
>> month, have to update their apps to use different db names. Could also solve 
>> that problem. Old stuff goes away, aka sliding window.
>> • Adam: you got it
>> • Rnewson: so it partitions the database in some way and then that drops out 
>> of the live db?
>> • Joan: let’s not bake in a specific backend. Could have scripts for that
>> • Jan: need a format like mysqldump
>> • Jan: I like the idea of streaming that to a new DB
>> • Jan: In a couchdb 1 world can just read the couch file
>> • Adam: the use case Joan described is a telemetry store. Recent data in 
>> Couch. Want to keep it but cheap files on disk. That’s a continuous process. 
>> Like to have a TTL on the document. That’d different than just exporting the 
>> DB
>> • Chewbranca: should talk about rollups for telemetry. Metadata, hourly. 
>> Very common feature
>> • Adam: less common. I don’t think Couch should try to do it
>> • Chewbranca: I think it’s a good feature. But we can skip it
>> • Jan: let’s skip for now
>> • Jan: we agree we want something. Got some ideas
>> • Adam: “streaming” archive of documents that have outlasted the TTL may be 
>> a different thing than a one-shot bulk archive. Both could hopefully use the 
>> same format.
>> 19. DB update powered replicator
>> • Jan: replicator database… not everything needs to be live until written to
>> • Rnewson: problem is the scheduler? Might define 1000 jobs at once? We’re 
>> working on that. Big project we started just before 2.0 was out the door. 
>> Started in the Apache repository.
>> • Adam: Jan’s also talking about being able to drive replication to a large 
>> number of DBs
>> • Rnewson: it’ll decouple the declaration of replication doc from when it 
>> runs
>> • Rnewson: should include Jan in db core team to talk about scheduler
>> • Rnewson: “replication scheduling” maybe?
>> • Gregor: I’d like to have 10000 databases with 10000 defined replications
>> • Rnewson: exactly the problem we’re tackling
>> • Rnewson: scheduler has a thing where it examines work and sees if it’s 
>> worth running, e.g. if something hasn’t changed. It’s not that smart yet
>> • Chewbranca: event-based push replication? How about that?
>> • Rnewson: perhaps, it’s in the roadmap. Say you’re Cloudant, we have lots 
>> of accounts, every cluster gets its own connection, that’s silly
>> • Chewbranca: yes but also could incorporate into doc updates. If there are 
>> outgoing replications, post out directly
>> • Rnewson: I dunno
>> • Rnewson: that’s what the db updates piece is. There and then it tells the 
>> scheduler it’s worth replicating
>> • Rnewson: we care about wasted connections, resources. Want to avoid 
>> situation where database I’ve hosted somewhere if it hasn’t updated I won’t 
>> replicate. Stop those jobs entirely. Timebox them
>> Consistent databases
>> • Jan: consistent databases, will skip
>> • Rnewson: let’s talk about it
>> • Adam: databases that never have conflicts. Only exactly one version of 
>> document
>> • Rnewson: strong consistency
>> • *not sure*: Opposite of what Couch does today then?
>> • Garren: like RethinkDB
>> • *crosstalk*
>> • Jan: Nick brought this up
>> • Rnewson: what you do need is eventually consistency. 10 nodes is 10 
>> separate configs
>> • Chewbranca: lack of eventual consistency is real problem
>> • Rnewson: can solve that as with the dbs database
>> • Adam: we have such weak query capabilities across databases. If it were db 
>> level it might be a fairly common use case, 99% of the records in particular 
>> DB can be eventually consistent. Documents with particular attributes could 
>> be targeted. Could push it to the doc level
>> • Adam: could guarantee for certain docs with certain attribute that they 
>> never have conflicts
>> • Chewbranca: I think it’d be good for the dbs db to be consistent. One way 
>> or another that’s a major problem. Conflicts in the dbs db is terrible
>> Pluggable storage engine
>> • Jan: next: pluggable storage engine
>> • Paul: almost done. Need to test with before and after pluggable storage 
>> engines in the same cluster for rolling reboots. Been a day and two time 
>> zones since I looked at it. Had a bug in the test suite. Getting ready to 
>> pull the mega PR button trigger.
>> • Paul: been confusing recently. Basically it’s a refactor of the internals 
>> to give us an API. No new storage engine. Alternate open-source 
>> implementation to prove it’s not over-specified. Merging would create a 
>> couple config things. Goal is to let people play with it. No new storage 
>> engine. No changing of data. All old dbs still work fine.
>> • Jan: if we can document this well we can get lots more Erlang folks
>> • Nolan: we do this in Pouch, it’s not recommended though to use Mongo etc.
>> • Joan: is Paul’s thing open source?
>> • Paul: I used a nif (?) to do the file I/O, couple file optimizations, want 
>> to minimize number of times we write doc info to disk. Uses three files per 
>> DB. Took me two days. Close to the legacy storage engine but sufficiently 
>> different to prove API isn’t overly specified. Will show in PR. Couple 
>> corner cases.
>> • Jan: opportunity for a storage engine that doesn’t trade everything for 
>> disk space, but has the consistency. May be okay for certain types of uses.
>> • Paul: lots of cool things to play with. Per-database encryption keys 
>> instead of filesystem encryption. In-memory for testing and playing
>> • Jan: please
>> • Paul: as soon as we have the API we can do lots of things
>> • Garren: interesting to compare to Couchbase
>> • Rnewson: when will the PR be ready?
>> • Paul: hopefully next week. Need to rebase. Test suite passes. Want to set 
>> up a cluster with and without just to make sure. Set up a mixed cluster.
>> • Adam: need to do something about attachments? Needs to store arbitrary 
>> blobs.
>> • Paul: for attachments that is the only optional thing I wrote in to it. If 
>> you have a storage engine that doesn’t store attachments you can throw a 
>> specific error. Otherwise it’s an abstract API mimicking how we do things now
>> Mango adding reduce
>> • Jan: Mango adding reduce
>> • Jan: goal is to add default reducers to Mango
>> • Rnewson: isn’t the goal with Mango that it looks like Mongo?
>> • Adam: doesn’t have to
>> • Garren: we keep seeing Mango/Mongo, is there a different query language we 
>> want to do?
>> • Jan: database query wars?
>> • Jan: either mango or SQL
>> • Nolan: been pushing Mango for IDB at W3C. tell me if you hate it
>> • Rnewson: don’t hate it, but goal is to make Couch more accessible
>> • Jan: I’m totally fine
>> • Rnewson: just saying are we cleaving to this. Is this why Mango exists, 
>> because of Mongo?
>> • Rnewson: similar but not identical is okay
>> • Chewbranca: reason we don’t want to promote Mango?
>> • Jan: we’re doing it
>> • Adam: it’s got a bunch of traction behind it
>> • Chewbranca: if it’s good enough, we should go with it
>> • Garren: it can only do a certain number of docs though?
>> • Paul: there is a limit. We talked other day about sort. There will be a 
>> limit for that as well. Biggest downside of Mango is people think it’s 
>> smarter than it is. Comes from Mongo. E.g. their sort has a 32MB cap. Works 
>> until it doesn’t.
>> • Jan: this is about declarative form of existing reduce
>> • Jan: mango is currently only a map function
>> • Garren: best way to learn how people use Mango is to see pouchdb-find 
>> issues. People start using it, then they ask questions. Once you know Mango 
>> is map/reduce with sugar then you kind of get it. But if you don’t get it 
>> you struggle. Making it more intuitive saves me time.
>> • Rnewson: yeah people assume it’s like Mongo
>> • Jan: why isn’t this the primary way to interact with Couch. We need to 
>> take it seriously
>> • Rnewson: we should be saying that
>> • Jan: we had an out-of-the-blue contribution to this recently
>> Break
>> • Some talk about Couch-Chakra: seems Chakra would be easy to embed, runs on 
>> Windows (ARM/x86/x64), Ubuntu (x64), MacOS (x64): 
>> https://github.com/Microsoft/ChakraCore
>> Mango: adding JOINs
>> • Jan: fake the pattern of joining documents, once we have that (foreign doc 
>> idea) could also have a foreign view key
>> • Jan: linked document is the first thing
>> • Chewbranca: could potentially get view collation in Mango as well
>> • Nolan: is there anything in MR we don’t want to add to Mango?
>> • Jan: arbitrary JS, aside from that…
>> • Nolan: people do want computed indexes though
>> • Jan: people want to define a mango index that you can run a query against, 
>> but what goes into the index is limited by the expression… people want to 
>> write js/erl etc. with a map function but they query with mango. Given they 
>> use the right schema to produce the right data
>> • Paul: mango applies start/end key automatically. When you get into 
>> computed I don’t know how that would work. Could scan the entire index or 
>> alias it
>> • Paul: key is an array with the secret sauce that maps to an index. 
>> Selector has “age greater than 5” it knows that the first element of the 
>> array is age
>> • Jan: whatever the custom map function is
>> • Nolan: issue 3280. Garren you didn’t find this compelling?
>> • Garren: not sure what they’re trying to achieve
>> • Jan: regardless there’s a bunch of stuff we can get to later
>> Mango result sorting
>> • Jan: result sorting
>> • Jan: question is I’m reducing a thing, sort by the reduce value. Standard 
>> databasey stuff. Problem of being an unbounded operation. Current policy is: 
>> CouchDB doesn’t have any features that stop working when you scale. Current 
>> non-scaling features are no longer being worked on. Want to be more 
>> dev-friendly though.
>> • Garren: problem is it’s cool, people will use it, but you have to limit 
>> the results. But if you don’t put it in people will get frustrated
>> • Paul: algorithmically we don’t provide a way to do it inside of the DB. 
>> Nothing we can do inside of the cluster user can’t do locally. Sort by age? 
>> Can’t paginate through that. Have to buffer the entire result set. Only 
>> pertains when you don’t have the sort field indexed.
>> • Rnewson: all these Mango additions are features that work cosmetically. 
>> Not a Mango thing so much as a view layer thing. Needs to scale. If I have a 
>> million things in my view, and I want it sorted by another field, don’t want 
>> server to run out of memory. Or we don’t do it.
>> • Chewbranca: why not make another view?
>> • Rnewson: what you find in other DBs is new kinds of index structures. 
>> Elaborate. All the magic. We don’t do any of that. Maybe we’re getting away 
>> with it and that’s a good thing
>> • Jan: classic example is game scoring
>> • Rnewson: want to sort by value
>> • Rnewson: maybe multi-view, probably a new format.
>> • Nolan: maybe changes on views solves this? Then the view is a DB, can do 
>> views of views
>> • Rnewson: problem of tombstones though. Also we said we wouldn’t do this
>> • Nolan: that’s how pouchdb map/reduce works today
>> • Jan: could cover this in the tombstone discussion
>> • Rnewson: having it as replication source, that’s a sharp end of the impl. 
>> There’s a lot less we could do for changes feed of the view
>> • Chewbranca: nothing preventing us from doing double item changes
>> • Rnewson: there’s bits and pieces for this that we said we’re not going to 
>> do because it’s complex, so should excise
>> Bitwise operations in Mango
>> • Jan: bitwise
>> • Jan: user request, like a greater than, equals, you also want bitwise 
>> operations
>> • Rnewson: fine
>> NoJS mode
>> • Jan: noJS mode
>> • Jan: consensus around table already probably
>> • Jan: would be nice to use 80% of CouchDB without JS, using Mango as 
>> primary query, won’t include document update functions
>> • Jan: update functions will go away because we have sub-document 
>> operations. Just query and validation functions
>> • Joan: one of the things I’ve seen update functions used for that this 
>> proposal doesn’t resolve is server-added timestamps to the document
>> • Jan: not a fan. It’s still optional. You don’t have to go through the 
>> update function to put the document. App server can do this
>> • Rnewson: is it important for noJS mode? Could make that a feature.
>> • Joan: it is certainly one option. Been talked about a lot. Auto timestamps.
>> • Garren: what about clustering?
>> • Rnewson: not important for multi-master replication
>> • Jan: have different TSs on different nodes
>> • Rnewson: has to be passed from coordinating node down to the fragment
>> • Jan: there’s a Couchbase feature coming where the clients know which shard 
>> to talk to, reducing latency. Same could be done for us
>> • Chewbranca: tricky thing here is supporting filter functions
>> • Garren: can’t do it with mango?
>> • *crosstalk*
>> • Jan: can do that already
>> • Rnewson: not doing the couchappy stuff
>> • Garren: need a baseline. What do we need that can run without JS?
>> • Jan: Joan, open an issue for autoTS? Also automatic recording of which 
>> user created it
>> • Joan: you got it
>> • Rnewson: if they’re always available they’re not usually huge
>> • Chewbranca: tickets for removing old features like the proxy stuff?
>> • Jan: add it to the list… like killing couchapps
>> • Jan: question was how do we get rid of old stuff we don’t want anymore.
>> • Rnewson: need a list
>> • Jan: Chewbranca, open a ticket
>> • Chewbranca: ok
>> • Jan: couchapps, all the things you mentioned
>> • Rnewson: had customers at Cloudant as for this. Server-side timestamps. 
>> Little bit of useful metadata
>> Query server protocol v2
>> • Jan: query server protocol v2
>> • Jan: If we switch to embedded Chakra this goes away
>> • Jan: while the JS is going, could load the next batch from this. If it’s 
>> in-beam we can think of parallelizing this. One Chakra per core.
>> • Rnewson: competing desires here. Building one view faster is one use case, 
>> many views is another
>> • Jan: this is about external views, JS views. Should be a streaming protocol
>> • Rnewson: clouadnt query, chakra core, maybe it’s not an extendable thing 
>> anymore. Don’t go out of our way to make it non-extendable.
>> • Garren: so you’d never be able to write a map/reduce in Python
>> • Paul: I’d like to see it internally at least have a Couch os proc manager, 
>> remove that and put it in its own app
>> • Paul: That way, we could embed V8 or Chakra, but we could clean up 
>> internally so we aren’t… native language implementation is happy. Send and 
>> receive protocol for OS processes. I’d like to clean that up. That way you 
>> can say as you generate your release, hey for JS I’m going to use the V8 
>> engine embedded. Everyone with me?
>> • Joan: I think the key here is that we’re talking about this as no longer 
>> being an externally consumable API. behooves us to write an API we can 
>> understand and use. Never gonna call out in the documentation here’s an 
>> external view server.
>> • Paul: think we should document it, make it easy for people to do things. 
>> If someone wants to write a Python adapter should be possible to do. Should 
>> be as easy as including a Github repo.
>> • Jan: “legacy plugin”
>> • Paul: right. Include arbitrary languages by changing doc config
>> Full text search
>> • Jan: full text search
>> • Jan: should ship as part of Couch
>> • Rnewson: we can do that, we need to flush it to ASF. caveat: has no 
>> community around it. The components that make it work are abandoned. Given 
>> up faking JVM semantics inside of Erlang. Don’t know if it has a lot of life 
>> left in it. It works. But if we have any issues with those components it’s 
>> going to be on us. It’s like mochiweb. Instead of one person we end up with 
>> zero.
>> • Jan: for now I’m fine with running this for awhile
>> • Rnewson: getting care and feeding but people do hit edges
>> • Chewbranca: not going to get rid of it though?
>> • Rnewson: not unless we can replace it
>> • Chewbranca: it’s the best thing we’ve got today. There are merits to 
>> including it
>> • Rnewson: can’t use JDK 7
>> • Chewbranca: there are problems, but if it’s incorporated by default
>> • Rnewson: only JDK 6
>> • Jan: on lunch, overheard discussion of getting rid of JVM
>> • Rnewson: I’d love that. Prime reason we have it is FTS. don’t see an 
>> alternative
>> • Chewbranca: there’s sphinx
>> • Rnewson: it’s a toy
>> • Jan: SQLite?
>> • Paul: think we should focus on making it easier to use. There are people 
>> like me who don’t like running a JVM. I’m okay running it at Cloudant 
>> because it’s the least worst thing
>> • Jan: alternative today is running Solr or ElasticSearch. Running the JVM 
>> anyway.
>> • Paul: if you read the instructions in that ticket it’s kind of insane. We 
>> don’t notice it because we’ve done it in Chef once
>> • Paul: if we go through and make these sorts of things easier to plug in. 
>> If we can get to there, we don’t have to care… rebar pulls them down
>> • Rnewson: implemented as a pluggable option
>> • Chewbranca: caveat is we’re talking about FTS to empower more features in 
>> Mango. Do we want FTS to build Mango with?
>> • Rnewson: not sure we’ve come to that yet
>> • Chewbranca: how complicated is it to have conditionals in the Mango code
>> • Rnewson: sat down with Kowalski 3 years ago to talk about this
>> • Joan: the Lucene project has links to a number of impl’s in other languages
>> • Rnewson: long and sorry history
>> • Rnewson: none of them are as feature rich. They all die
>> • Jan: that list is there to just inform people
>> • Rnewson: Lucene has eaten that whole space
>> • Rnewson: I’d love to build an Erlang version if we had the time. Just wish 
>> Lucene wasn’t using Java
>> • Chewbranca: is Sphinx really that bad?
>> • Joan: yes it chokes up on multiple requests in a queue. It’ll hang
>> • Rnewson: look at Riak they tried a few FTS and Lucene is the only game in 
>> town
>> • Rnewson: I should have written it in Java rather than Scala so that it can 
>> work in new JDKs
>> • Jan: we’re interested in getting this feature in Couch
>> • Rnewson: there’s a very small community that has made some contributions
>> Rebalance
>> • Jan: rebalance is a poorly named thing
>> • Jan: couchbase has this really nice feature, you have a 3 node cluster 
>> running, you put in the ip address, it rebalances, you see a progress bar, 
>> now you have a 4 cluster node. Can keep going and automate that. Their 
>> sharding model is simple enough for them to do. Dunno if they do it with 
>> queries. Needs to be recalculated somewhat. Their storage model is 
>> in-memory, relatively fast.
>> • Rnewson: hard part is coordinating reads and writes for this process
>> • Garren: don’t they have a lot more v buckets (?)
>> • *crosstalk*
>> • Garren: aren’t they moving shards around?
>> • Rnewson: let’s stop saying “rebalance”
>> • Jan: what they’re doing is moving the shards around
>> • Garren: I chatted with Mike Wallace about this. Our q factor is too low. 
>> We need more shards so that we can distribute it more. Cassandra starts at 
>> 256.
>> • Rnewson: if we could do that without downsides, then rebalancing is just 
>> moving shards. We have something in place to do this in a live cluster. 
>> Unpleasant and slow, but it works
>> • Joan: did some benchmarking at Cloudant about q cluster. If q was too big, 
>> perf suffered. It’s because of some assumptions we made on number of 
>> processes running and disk I/O.
>> • Paul: single doc operations scale well. Anything that aggregates across 
>> multiple shards scales
>> • (break)
>> • Chewbranca: tricky bit is shuffling the data around. If you’re caching the 
>> key rather than the id then you need to send the result differently
>> • Adam: it’s not about redoing the architecture of the database. If you add 
>> a database today and change the partitioning in most cases you are redoing 
>> the index
>> • Chewbranca: where we increased the cluster capacity and redid the database
>> • Rnewson: we don’t have to do all of that… can go from 3 nodes to 12
>> • Jan: exercise tomorrow
>> • Rnewson: I’d like to see the ability to move the shard around
>> • Garren: two step process
>> • Rnewson: can reach a couchbase-like solution
>> • Rnewson: ability to move shard around while accepting reads and writes 
>> would be huge
>> Setup
>> • Jan: still terrible
>> • Jan: mostly Fauxton work
>> • Chewbranca: seems like a chicken and egg problem
>> • Garren: can only do a setup, then once it’s setup you can’t say “here’s 
>> another node”
>> • Garren: if you have three nodes and you only set up 2, you need to start 
>> again
>> • Adam: I’m biased by Kubernetes. There are elegant solutions in specific 
>> environments. Trying to solve all problems in a generic Couch construct is 
>> not a bad thing to take on, but concerned about our ability to...
>> • Jan: if we do a homegrown thing, should be higher level
>> • Jan: this is where we come back to the single node story
>> • Adam: do you not see increased adoption of Docker for development?
>> • Garren: I love Docker
>> • Jan: not a fan
>> • Jan: whole idea of containerizing and standard containers that’s really 
>> nice but in the apt-get crowd…
>> • Jan: Greenkeeper is Docker. It’s easy and it works
>> • Garren: what do we want to do? Get the amateur up and running with a 3 
>> node cluster?
>> • Jan: we have that. Little bit past amateur. No chef, no puppet, no AWS
>> • Rnewson: huge number of people who would run a Couch cluster without using 
>> tools?
>> • Garren: people running on Windows, yes
>> • Rnewson: I can see not going with any specific thing, Kubernetes, Chef, 
>> Puppet, we’d be taking that whole thing on
>> • Jan: doesn’t have to be the whole thing
>> • Rnewson: how far do we go?
>> • Jan: very clearly say: if you want to run professional Ansible, or people 
>> already know they need to use that
>> • Garren: this discussion links up to config. Can we link those two together?
>> • Background is Cloudant obviously already runs this
>> Cluster-aware clients
>> • Jan: in Couchbase they provide client libraries that know which node to 
>> talk to so they don’t have to proxy through one node. Lower latency access. 
>> They have impressive numbers. Guaranteed sub-10ms latency, they can do that. 
>> But you can only do that with cluster-aware clients. Saves a hop.
>> • Chewbranca: to enable, need shard-level access. Could skip quorum.
>> • Adam: I don’t think we’d support this in Cloudant anytime soon. It makes 
>> it harder to run our tiered service. There’s SSL termination whatnot. We 
>> also have this compose property. As cloud service providers this one could 
>> possibly solve problems. I don’t think the numbers back up the savings. 
>> Where you could think about it is there are endpoints that the overhead of 
>> aggregating the results and delivering one result is problematic
>> • Joan: talked about this with replication
>> • Adam: anything that consumes changes feed, merging that changes feed and 
>> generating that is more expensive. If goal is to get access to all contents, 
>> avoiding that aggregation
>> • *crosstalk*
>> • Jan: moving on is fine
>> • Rnewson: I’d like to solve that particular problem
>> • Rnewson: I think we can do replication that doesn’t put that burden on the 
>> client
>> Fauxton
>> • Jan: I don’t like how it looks. Trying to query a view, thing pops up, I 
>> want to change something, it pops up. I want to live change something on the 
>> site, my main point is information density. Garren can share some aspects of 
>> the design
>> • Garren: Justin has done a nice job. Made a mistake in the beginning with 
>> two tabs. Biggest mistake we’ve done. It’s limited how much window we have 
>> to play with data. That’s our big push, move from the sidebar to tabs, so we 
>> get more space
>> • Garren: Justin is working on this internally, want to open it up for 
>> community discussion
>> • Chewbranca: I like the information density view when you’re dealing with 
>> rows of data, can pretty-print
>> • Garren: we should default to a table look
>> • Jan: moving on
>> • Garren: best thing is review the design and take the feedback
>> Releases
>> • Jan: last time, talked about this with Noah
>> • Jan: nature of changes should define the version number, define what major 
>> versions are
>> • Jan: need to be careful, can’t have 10 major versions per year
>> • Jan: one per year is fine
>> • Jan: one major feature per thing. Shouldn’t be that bad
>> • Rnewson: missing people to actually do that work. Lots of useful important 
>> features
>> • Chewbranca: do we want to have an always-releasable branch? Tricky to do 
>> when everything flows into master
>> • Rnewson: we had talked about test suites, those go hand in hand. Can’t do 
>> a release without a comprehensive test suite
>> • Chewbranca: how much of our tests are Cloudant private?
>> • Rnewson: coverage for search, geo, other things.
>> • Chewbranca: introduces a regression on query, dropping down from 8 to 4k
>> • Rnewson: we removed that. Took the code out
>> • Rnewson: idea is that the PouchDB test suite is the most comprehensive 
>> thing out there
>> • Rnewson: Cloudant has the same problem, has comprehensive review, what’s 
>> involved before it hits production. There’s a lot of manual intervention in 
>> between because there’s a lot of stuff we don’t trust
>> • Jan: need to take the magic out.
>> • Joan: at one point we brought somebody in, they didn’t get support. They 
>> rage quit. It’s not changed a lot over the years. Feels like a good way to 
>> give back.
>> • Chewbranca: how much of the test suite runs against the clustered API? 
>> Testy’s not in CouchDB
>> • Rnewson: we’ve talked about open sourcing it
>> • Chewbranca: all the unit tests run against cloudant 8.6
>> • Rnewson: to get releases flowing again, need to review test suites, figure 
>> out coverage. Not gonna do regular releases if people have to do manual 
>> tests. Figure out what broke. Those things take months. Needs to not be that.
>> • Rnewson: CouchDB/Cloudant interaction is we have to have tests added so 
>> Cloudant has confidence
>> • Garren: single repo would help here. Then we could do stuff with Travis
>> • Rnewson: need a repo per domain of change. FTS will be separate. 
>> Everything else inside of…
>> • Jan: if we have some time this weekend after discussing things we should 
>> explore this. Just need to do it.
>> • Chewbranca: need to figure out what test suite we’re going to use
>> • Jan: bit of a TBD. want PouchDB people to contribute. Python tests. Maybe 
>> move everything Erlang to unit tests
>> • Chewbranca: still a case for testing clustered scenarios. Always been 
>> really awkward. Brian Mitchell looked at this years ago
>> • Rnewson: we never surfaced it. At Dev Day was gonna do a talk on fault 
>> tolerance. There’s no way to demonstrate weak quorum didn’t happen. Can take 
>> 2 of 3 nodes down we’ll be fine. Not testable because it’s not visible. 
>> Internal replication stuff… Jepsen level of understanding, do they work, we 
>> think so. All of those things we could enable test suites like Jepsen to 
>> focus on those things.
>> • Chewbranca: one last thing, is the game plan not to do anything with 
>> Quimby (?) or Testy and it’s going to be ported to the JS functions
>> • Rnewson: little early for that. We run Testy and the eunit test suite, JS 
>> tests. We don’t run the PouchDB tests. That’s the current barrier to 
>> production. If we’re going to change that and I think no one would object if 
>> it’s easy to run. Would get more contributors. We also talked about 
>> pluggability. FTS is not yet open source. Want to be able to merge tests.
>> • Nolan: let’s take a look this weekend, see what you need to remove
>> • Rnewson: same for Testy
>> • Nolan: dealbreaker may be that we’d keep the tests in the monorepo but 
>> would publish as a separate package
>> • Rnewson: glad you mentioned that, may be a problem
>> • Garren: not going to just run the tests against Couch to prove they work?
>> • Jan: would be weird not to have the test suite part of the Apache project
>> • Rnewson: if we can’t get more contributors then there’s not much value
>> Performance team
>> • Jan: I like how SQLite did this, lots of small improvements that added up. 
>> Some low-hanging fruit in CouchDB still
>> • Chewbranca: there’s lots of fruit there
>> • Rnewson: should be significant ones
>> • Chewbranca: e.g. increasing q for aggregation. Saturated CPU on list 
>> merge. Things like that would have a big impact
>> • Garren: how easy would it be to change to something less CPU intensive?
>> • Rnewson: there is a Cloudant perf team
>> • Jan: Ubuntu ran a micro benchmark, very hard to do these kinds of things, 
>> need to have CouchDB be faster. What does a perf test in the cost of a 
>> cluster mean.
>> • Adam: if we could redact customer data
>> • Chewbranca: relatively easy to start publishing results. Could make a 
>> workflow to get benchmarks run. Tricky bit is majority of code is about 
>> bootstrapping clusters, Chef stuff, tightly entangled. Not feasible to 
>> open-source. Kubernetes-based would be easier.
>> • Jan: using your infrastructure without having to open source it?
>> • Chewbranca: that’s intruiging
>> • *crosstalk*
>> • Adam: lots of precedence for this
>> • Jan: I can’t speak for how much time you invest but
>> • Chewbranca: let’s say we want to know view query perf as function of q. 
>> We’ll queue up entire suite of benchmarks to change q through those, run the 
>> whole thing, save all results, do it all again, for each iter, replicate 
>> benchmarks multiple time, it’s great, but that takes 12-24 hours. So we 
>> wouldn’t necessarily want to put out a service that lets people do 50 hours 
>> of benchmarks
>> • Jan: happy with that. Can be done manually. If it’s only a thing you 
>> control
>> • Chewbranca: need to build a wrapper around that. Not too hard
>> • Adam: if you think people will explore, this would be a solid investment. 
>> Don’t want to do a lot of work and then no one uses it
>> • Jan: flamegraphs can be done later on, find a bunch of CS students, pay 
>> for a summer internship, but this would be microbenchmarking stuff.
>> • Chewbranca: what’s the policy?
>> • Rnewson: conclusion?
>> • Jan: I want something like what we did for Hoodie. We put out a job ad. 
>> Here’s something we want. I want to get people interested. Here’s what 
>> you’re responsible for.
>> • Gregor: worked well to create issues around low-hanging fruit and give 
>> some information, then say up for grabs
>> • Adam: sometimes need a different implementation, you have to understand 
>> the whole thing
>> • Nolan: PouchDB does have a perf test suite. Highlights cases where we are 
>> slow vis a vis CouchDB. Also comparing across PouchDB storage 
>> implementations (in-memory, LevelDB, IndexedDB).
>> • Garren: will be useful for Pluggable Storage.
>> • Russell: eager to engage more with the community here. Wasn’t quite sure 
>> how to navigate what to open source.
>> Benchmark Suite
>> • Russell: our underlying kit is simple, based on basho_bench. We could open 
>> source this, but real utility is in our automated runner across the 
>> parameter space.
>> • Rnewson useful to opensource the basic framework
>> • Garren: let’s provide the “here’s how to do proper benchmarking” document
>> • Russell: you want a book? Lots of detail involved here
>> • Rnewson: Benchmark suite vs. synthetic load suite?
>> • Jan: Synthetic load suite is more about capacity planning for a specific 
>> customer workload, as opposed to broad-based perf measurement over time
>> • Russell: we have some of this synthetic load suite in our portfolio as 
>> well. Nightlies are tough because our individual test runs take 10s of hours
>> Synthetic Load Suite
>> • As discussed above. Will be very nice to have once the benchmark suite is 
>> running
>> Consolidate Repositories
>> • Everyone agrees to do this
>> CI for Everything
>> • Jan: I’ve articulated a vision here, ideally want things all running on 
>> ASF in the future
>> Schema Extraction
>> • Jan: See V8’s property access optimization
>> • Jan: Pointer off the DB header that has all the schemas for all the 
>> documents in the database
>> • Jan: would be nice - “change all the values of a field for all documents 
>> obeying this schema”
>> • Rnewson: this is what Lucene calls elision
>> • Russell: what are you trying to accomplish?
>> • Jan: first goal is showing all the schemas in the database
>> • Adam: do you keep stats on how common these are?
>> • Jan: Not yet but easy to add
>> • Paul: downside is really wonky schemas, e.g. email address as key instead 
>> of value
>> • Adam: we’ve spent quite a bit of time on this internally at Cloudant 
>> without getting to prod yet. Definitely an interesting topic for data 
>> quality issues
>> • Jan: could go and explore schema migration on compaction
>> • Rnewson: “X-on-compaction” gets gnarly quickly
>> Native container support
>> • Adam: Could include things like service discovery via DNS SRV for finding 
>> cluster peers
>> Database corruption detection & repair
>> • Jan: we had a customer in Nov 2015 that screwed up their backups, and 
>> messed up the restore as well with merged database files. We had to write a 
>> tool to manually search for doc bodies. Wasn’t quite good enough for the 
>> client but it was a start. This is proprietary and can’t share.
>> • Rnewson: from a detection perspective, we have checksums on lots of things 
>> but not everything.
>> • Garren: do we see this a lot?
>> • Gregor: even if this is 1:1000 we should really care
>> • Rnewson: we should be fully conscious of disaster scenarios when we design 
>> our database format
>> • Jan: we could have a merkle tree accompanying our btree to compare 
>> replication peer btree.
>> • Adam: does the merkle tree thing work with our btree balancing logic?
>> • All: Unclear
>> • Rnewson: if you run out of disk and then clear disk space without 
>> restarting the server, you get into a bad place
>> • Rnewson: need to define what guarantees we’re providing here
>> • Rnewson: I’d be all-in on doing more here
>> • Jan: need to MD5 index data in addition to documents
>> • Rnewson: Recoverability in disaster involves lots of tradeoffs. We could 
>> do much more but the ROI is unclear
>> • Davisp: With pluggable storage engines we can also allow for tradeoffs 
>> here for users to make tradeoffs between performance and data security.
>> Wide-clusters
>> • This is an attempt to get past the throughput limitations of individual 
>> clusters by using a routing proxy on top
>> Create an exclusive namespace for databases (/_db/<dbname>)
>> • Lots of discussion about what could be done to rationalize the API
>> • We are all fading and need food
>> 
>> Day 2
>> 
>> Improved Erlang Release Building
>> • Easier configuration of variable builds for CouchDB
>> • To maybe configure dreyfus
>> Richer Querying Model
>> • We have a separate issue for considering adding joins to Mango
>> • Chainability works disastrously in Cloudant today
>> • Sort-by-value is a useful construct
>> • Can _changes on views help us out here?
>> Systemd Handler
>> • We won’t build this ourselves
>> • But we will put a ticket out there and invite someone who’s passionate 
>> about it to take it on
>> Exclusive namespace for DBs
>> • Punt on it now and wait for a bigger API redesign discussion
>> Single Node CouchDB
>> • Not sure what our story is for single node
>> • Single node was the historical way to run couchdb since forever
>> • Set n=r=w=q=1 and done?
>> • Set q to number_of_cores automatically
>> • We accept it’s an important use case
>> • 2 node cluster is the confusing case (“hot spare” not supported)
>> • 1 node cluster is still chttpd/mem3/fabric, not couch_httpd_*
>> • Asynchronicity of db creation/deletion still affects single node cluster 
>> (see consistent-dbs ticket)
>> 
>> Externalize Erlang Term things (_revs, replication _ids)
>> • Interop is impaired when we use term_to_binary
>> • Instead convert to ‘canonicalised’ JSON form and hash that instead
>> • Unicode normalization and floating point representation get in the way of 
>> general-purpose canonicalization
>> • Overall goal: clearly define the procedure for generating these revisions, 
>> and define it in a way that makes it possible for PouchDB to reproduce
>> 
>> PouchDB Governance
>> • Open Collective / Patreon / Linux Foundation?
>> • Concerned about ASF  (re: jira etc)
>> • Generally onboard with idea _of_ governance
>> • 3-5 people on call to judge reimbursements
>> • Gregor can help with the Open Collective setup
>> • New “JS foundation”, more aware of JS ecosystem (are members of W3C)
>> 
>> _all_local_docs
>> • Richer API for local docs in general
>> • Ability to make views on them, maybe
>> • Namespacing issues (replication checkpoints doc id calculation)
>> 
>> Telemetry db rollups
>> • Good candidate for a plugin
>> • IoT use case
>> • Adam: for IoT there are many thing we can do and this wouldn’t have to be 
>> first
>> 
>> Server-side mass update functions
>> 
>> • Adam: better as a mango selector than as a JS function
>> • Jan: what about conflicts?
>> • Rnewson: value in moving it server-side.
>> • Adam: Keep it async, run once on every doc existing in the DB at the time 
>> the task is submit
>> • Rnewson: must be idempotent
>> • Adam: we can guarantee that if it’s a Mango selector
>> • Garren: but is that still useful?
>> • Paul: sounds like a good plugin
>> • Adam: I’m concerned about plugins operating at this level of the database
>> • Summary: we do want to do this but it’s a difficult problem, we also want 
>> to avoid arbitrary Javascript function. So mango-style updater that enforces 
>> idempotency.
>> 
>> Auto-record document timestamp
>> • Jan: potentially creating lots of conflicts
>> • Rnewson: if we do this we can choose to exclude it from the revision 
>> calculation.
>> • Russell: This could mean that multiple documents with identical revisions 
>> could have different timestamps
>> • Adam: that’s actually a more accurate representation of reality than what 
>> we do today
>> • Russell: This is a trivial thing to add in the application layer
>> • Gregor: I don’t know any other db that does this. It’s app-specific
>> • Rnewson: there are DBs that can do it
>> • Nolan: Postgres can do it
>> • Rnewson: sometimes you want to use it for time ordering
>> • Rnewson: possibly a rathole
>> • Adam: big difference between automatic timestamp vs $update trigger
>> • Conclusion: skip for now, too gnarly to consider replication scenarios
>> Rebar 3
>> • Rnewson: rebar 3 has problems with deps of deps. Jiffy has dep on core 
>> compiler, users have their own jiffy dep, recursive deps become a problem
>> • Chewbranca: no it’s that rebar 3 didn’t include the port compiler
>> • Adam: reasons to move to rebar 3?
>> • Paul: if we want to register our config on hex.pm. If we start using 
>> elixir… No pressing reason but better in the future
>> • Rnewson: fewer repos will help with build problems
>> Plugins
>> • Jan: vision is like browser extensions, get list of things to enhance my 
>> Couch, download it, install it. Never made it to 2.0. Could be compile time 
>> rather than Fauxton drag-and-drop. How to do this nicely across existing 
>> cluster?
>> • Adam: What do you want these plugins to be able to do?
>> • Jan: anything that a core Erlang module could do
>> • Jan: let’s try shipping geocouch/dreyfus as an installable plugin and see 
>> what happens
>> • Chewbranca: how do we recommend users upgrade a Couch cluster?
>> • Jan: you can upgrade one node at a time right?
>> • Rnewson: yes rolling reboot
>> db-per-user
>> • Jan: only exists because we don’t have per-doc user permissions on a 
>> single database. Don’t have that because of reduce view all that guarantee 
>> would be out the window anyway
>> • Adam: unless ddocs inherit credentials
>> 
>> Elixir
>> • Larger community than erlang
>> • Various ways of thinking about this
>> • Cognitive cost to having mixed erlang/elixir codebase
>> • Do we allow elixir piecemeal? Only new modules? Full rewrite
>> • Elixir will use erlang code/modules but won’t read it/develop it?
>> • Elixir community don’t use OTP patterns
>> • Elixir has extended the standard erlang library
>> • Where’s the boundary of the mono repo? Does it align with erlang vs elixir
>> • Need an example to prove it out
>> • Elixir anywhere means all contributors need to be fluent at reading it at 
>> least
>> • Elixir has built-in async task support, which we’ve implemented hackishly 
>> in erlang
>> • Other communities might be interested that also use elixir
>> • A lot to be said for elixir syntax
>> • Virding’s complaints about elixir, doesn’t like agents, doesn’t like 
>> macros. Let’s go in with our eyes open. Adopt/enforce a style guide. No 
>> emacs/vi mode for our version of Erlang yet; we should have one for elixir 
>> before we start accepting PRs.
>> • Rnewson: What about runtime debug / stack traces, operational concerns
>> • FYI: https://www.google.com/trends/explore?q=%2Fm%2F0pl075p,%2Fm%2F02mm3
>> • https://elixirforum.com/t/benefits-of-elixir-over-erlang/253/10
>> • Russell: Pipe macro is useful
>> • hot code upgrades in elixir?
>> • Russell: how do you write an Elixir application inside and Erlang 
>> application
>> • https://github.com/elixir-ecto/ecto
>> 
>> Windows:
>> • Jan: Lots of windows client downloads from our site (as many as the 
>> tarball!)
>> • Jan: Several known customers using couchdb on windows in production - in 
>> clusters!
>> • Jan: Couchdb 2.0 on windows is beta quality, one maintainer
>> • Jan: Drives adoption
>> • Adam: no interest in driving/improving from Cloudant
>> • Jan: don’t think there’s interest in couchdb via Docker on windows
>> • Garren: 1.6 or 2.0?
>> • Jan: customers using both in production
>> • Joan: 2.0 clusters on windows. Difficult to get Travis CI for this
>> • Joan: reducing manual steps to test releases will help adoption
>> • Joan: buffered stdio on windows is a longstanding problem
>> • Joan: Windows installer signing steps is hard to automate (write a SOAP 
>> client diediedie)
>> • Joan: what about bug reports from Windows users? If couchjs is gone, that 
>> could help a lot.
>> • Garren: Chakra would help there
>> • Joan: integrate Chakra with NIF not stdio
>> • Nolan: can help with Windows and Chakra
>> • Joan: Windows 7 and up
>> • Joan: many bugs due to lack of 32-bit support in our Windows binary 
>> installer. We’re not revisiting this - not building a 32-bit version
>> 
>> Native Mobile:
>> • Jan: part of couchdb ecosystem, none of it is in couchdb or asf at all
>> • Adam: we (cloudant) are directing more resources into pouchdb than mobile
>> • Adam: mobile libraries are in maintenance mode
>> • Nolan: more sense in investing in React Native?
>> • Nolan: confusion of projects around this
>> 
>> Couchdb as an erlang dep:
>> • Russell: If I’m building an erlang application, can I pull couchdb as a 
>> dependency for storage?
>> • Jan: not top priority, if we can enable it and have tests that we don’t 
>> break it, then we can do it, but embedding couchdb isn’t a goal
>> • Joan: Whole lot of work for not much gain, but would help bring the Erlang 
>> community closer
>> • Bob: yup
>> 
>> Remove required dependency on JSON, store binary data
>> • Bob: nope
>> • Russell: separate metadata (_id, _rev, etc) from data, reduces processing, 
>> can pass data back and forth more efficiently
>> • Bob: skeptical that you could get all the way to sendfile efficiency this 
>> way
>> • Paul: don’t necessarily do fd to socket anyway
>> • Bob: removing make_blocks might be an adequate improvement here instead
>> • Russell: marshalling/demarshalling is a big overhead
>> • Jan: we’d still present JSON, but we could save de/serialisation overhead
>> 
>> Switch to using maps or versioned records for internode communication
>> • Bob: Maps or versioned record not a complete solution
>> • Russell: having no solution here is a real pain, many hacks made to work 
>> around passing records between nodes
>> • Bob: you can see this in couch 2.0 in various ways, record field reuse, etc
>> • Garren: Avro? Protobuf?
>> • Adam: lessons to be learned from those, but not direct use?
>> • Adam: we used distributed erlang RPC for all things, we could separate 
>> control and data channels to improve this. This is not rocket science, 
>> well-established patterns/practices round
>> 
>> Multi-tenancy
>> • Jan: desirable for couchdb
>> • Adam: if MT became a first class feature of couchdb, is that tenants of 
>> tenants?
>> • Bob: I’d like cloudant to donate the full MT code to couchdb
>> • Jan: nobody can build the next wordpress on couchdb without MT
>> • Adam: docker reduces the overhead of standing up full per-tenant couchdb 
>> instances
>> • Russell: What is the vision of MT couchdb?
>> • Jan: looks like full(ish) couch but you see a subset of database
>> • Adam: interested in spending more time on this
>> 
>> 
>> IPV6 Support:
>> • Bob: should be an easy add, doesn’t work today
>> 
>> Deprecation Wishlist
>> • _show: Y
>> • _list: Y
>> • _update? Y
>> • OS daemons: Y
>> • proxy stuff: keep, for _users authn on external service
>> 
>> • view changes: remove current half-baked code, redo with TBD discussions
>> • couch_external_*: Y
>> • Rewrites: kill
>> • vhosts maybe: y
>> • I'd really like to remove the _replicator, _users, and other special DBs 
>> from being actual databases that support the whole database API and instead 
>> have a specific API that we define (even if we use databases under the 
>> hood).: keep _users, hide _replicator with the scheduler rewrite.
>> • Attachments :D (russell): N
>> • Public fields for in _users docs: Y
>> • Coffeescript: Y
>> • Custom reduce functions: administratively disabled by default?: yesish
>> • CORS: keep // caniuse.com/cors
>> • CSP: keep
>> • JSONP: drop // http://caniuse.com/#feat=cors
>> • OAuth: drop, but make easy to integrate
>> • Temporary views: drop
>> • TLS: keep
>> • Update_notification: drop
>> 
>> 
>> One major feature/change per major release. Avoid python 2 / 3 paralysis. 12 
>> months between major releases. Maybe 6 months
>> Roadmap:
>> 
>> 3.0
>> • Time frame
>> • Features
>> • MAIN: Fixing db-per-user/role / security system
>> • Tombstone Eviction
>> • Includes cluster purge
>> • Sub doc operations
>> • Maybe HTTP/2 experimental
>> • Mango VDU
>> • Mango Reduce
>> • Deprecations
>> 
>> 4.0
>> • Time frame
>> • Features
>> • http/2
>> • Deprecations
>> 
>> 5.0
>> • Time frame
>> • Features
>> • Deprecations
>> 
>> 6.0
>> • Time frame
>> • Features
>> • Deprecations
>> 
>> Channel Base
>> 
>> Some past discussion:
>> • https://github.com/couchbase/sync_gateway/issues/927
>> • https://github.com/couchbase/sync_gateway/issues/264
>> • https://github.com/couchbase/couchbase-lite-ios/issues/671
>> • https://github.com/couchbase/couchbase-lite-ios/pull/776
>> 
>> 
>> “Built in” by-user-seq view, keyed by [userid/role, update seq]
>> 
>> Docs get a new property _access: [[a,b],c] (a and b, or c)
>> 
>> /db/_changes
>> If db-admin: read by-seq
>> Else: read by-user-seq with multi-key query startkey/endkey for each role 
>> combination (needs upper bound, say 10, complexity is O(2n - 1), gets merged 
>> on coordinating node, update-seq is sorting factor. Possible optimisation: 
>> remove duplicates (bloom filter? trie)
>> 
>> /db/_all_docs
>> Same “all-user-docs” but value is doc id, not update seq // could be 
>> optional, not needed for replication endpoint
>> 
>> If a role/user is removed from a doc’s _access property, the user’s entry in 
>> by-user-seq gets value _removed, and well-behaving clients run purge locally
>> 
>> If a role is removed from a user doc, we don’t send _removed’s, A client 
>> desiring to keep the docs can keep them, clients that don’t can purge based 
>> on reading roles from user doc from server.
>> 
>> since=now can be implemented with endkey=<role>&descending=true&limit=1 on 
>> each by-user-seq segment
>> 
>> Views: a user creating a ddoc can make the ddoc inherit any or all roles 
>> they possess (but not more), and that view is then built from the 
>> by-user-seq tree.
>> • TBD: what to do with views that have no more members (all users with the 
>> defined roles (and combinations) are deleted, or get their roles revoked?
>> • Keep index in case there is re-granting of access?
>> • Delete index and suffer long view build into when users get roles added 
>> again?
>> 
>> Questions:
>> • Should we even send a remove:true (yes for doc removes)
>> • Possibly track past roles in the _user db to inform client what roles have 
>> changed
>> 
>> Day 3
>> 
>> Collision problem
>> • Users could potentially get a 409 conflict for documents created by other 
>> users that already exist, which would be a security concern because users 
>> could guess existing IDs (e.g. if an ID is an email address)
>> • Proposed solution is to instead send a 404 and allow users to create docs 
>> with the same IDs
>> • Problem with that is that then you have to magically prefix all IDs with 
>> the user ID
>> • Problem there is how you collate docs together, e.g. Gregor creates 
>> gregor:foo and Nolan creates nolan:foo, then they share. Do they both see 
>> docs with ID “foo”? Is there a rev-1 conflict?
>> • Alternative is to remove magical prefixing and enforce an explicit prefix 
>> with a built-in VDU, e.g. that says IDs must be prefixed with “<username>:” 
>> // c.f. S3 buckets
>> • Problem is if the fine-grained access control is a db-level config that 
>> can be enabled after DB creation, then the question is how the built-in VDU 
>> gets retroactively applied to IDs that were previously OK // maybe enabling 
>> this is a db-creation time feature, not a runtime feature
>> • Another alternative is to not try to solve the ID-guessing problem at all 
>> and to instead just tell app developers that they need to enforce global 
>> uniqueness on their IDs at the app level, e.g. with an app VDU
>> 
>> More open questions:
>> • Avoid 
>> https://github.com/couchbase/sync_gateway/issues/264#issuecomment-122301576
>> • Make sure 
>> https://github.com/couchbase/sync_gateway/issues/927#issuecomment-115909948 
>> is done right
>> 
>> Implementation of _all_docs in fine-grained scenario
>> • Need to do a parallel merge-sort of the different key spaces to avoid 
>> introducing duplicates
>> 
>> As we discussed earlier, users can create their own design documents and 
>> delegate a subset of their roles to the document, thus allowing it to read 
>> all the data that they can read. We discussed the possibility of doing a 
>> similar kind of optimization that we’re proposing for _changes and _all_docs 
>> for custom design documents; i.e., allowing an administrator to publish a 
>> design document in the database that can read every document and selectively 
>> exposing subsets of the resulting index to users based on their identity. 
>> There are a few challenges with this approach:
>> 
>> • In the fully-generic role-based access control (RBAC) model, a document 
>> could show up multiple times in the portion of the by-user-seq tree to which 
>> the user has access. We can address that with all_docs by smartly merging 
>> the rows, but we don’t know how to do that in the general case of custom 
>> user code
>> • Aggregations that cross different role keys would have to be blocked
>> 
>> Question: Could we implement the optimization just for Mango?
>> • Joins, aggregations, etc. will be even more complicated
>> • Indexing of arrays might complicate de-duplication today
>> 
>> Definitely worth investigating post-3.0
>> 
>> Russell presented an alternative proposal for an API that introduced 
>> “virtual databases” as a way to cover the specific case of
>> 
>> • Each user possesses exactly one role -- their ID
>> • Each document has exactly one _access field: the ID of the creator
>> 
>> Here’s the API:
>> 
>> PUT /zab?virtual=true, user=adm
>> PUT /_virtual/zab/foo1, doc, user=foo
>> PUT /_virtual/zab/bar1, doc, user=bar
>> GET /_virtual/zab/_all_docs, user=foo
>> rows: [
>> {_id: foo1, …}
>> ]
>> GET /_virtual/zab/_all_docs, user=bar
>> rows: [
>> {_id: bar1, …}
>> ]
>> GET /_virtual/zab/_changes, user=foo
>> 
>> This idea would still require a version of the by-user-seq tree to support 
>> an efficient filtering of the _changes feed
>> 
>> 
>> ===========================================
>> 
>> 
>> Tombstone Eviction
>> • Case 1: full space reclamation for deleted documents
>> • Case 2: narrowing of wide revision trees after conflict resolution
>> • Analyze replication checkpoints
>> • Compute Seq before which peers have ack’ed all updates
>> • API for introducing custom “databasement”
>> • Probably don’t need rev graph if we have this
>> • On compaction (or sooner), we can drop the edit branch/document entirely
>> • We should use clustered purge for cleaning up Case 2 edit branches
>> • Case 1 edit branches can probably skip the clustered purge path since
>> • All branches are in the databasement
>> • Every replica has already agreed on the replicas
>> 
>> 
>> 
>> CouchDB Release Cadence
>> • Cloudant to ask its release engineering team to help out with couchdb 
>> release building/testing/QA
>> • Joan happy to work with Cloudant folks on CouchDB release work
>> 
>> 
>> 
>> 
>>


-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: CouchDB Summit Notes

Reply via email to