That’s an even better summary, ignore mine :) > On 15 Feb 2017, at 20:25, Paul Davis <[email protected]> wrote: > > I think the general consensus is that we should investigate using > Elixir but there are some concerns on various aspects. For instance, > newer Elixir requires a minimum of Erlang 18.0. There's also concerns > about wether/how we mix Elixir into the existing applications. Whether > or not we start rewriting components in Elixir or only allow new > applications to use it. So, I'd call the general idea "interested but > unclear on the best path forward" which I basically take to mean we'll > see what the community comes up with. > > On Wed, Feb 15, 2017 at 2:14 PM, Ilya Khlopotov <[email protected]> wrote: >> >> Fantastic notes!!! >> >> While reading them I notice an Elixir related section. There is no >> conclusion on this unfortunately. It also quite hard to infer the sentiment >> from the notes. I would love to get a general idea on: >> How many people present at the meeting see the value in adopting Elixir and >> how many aren't? >> >> BR, >> ILYA >> >> >> Robert Samuel Newson ---2017/02/15 05:38:40 AM---From: Robert Samuel Newson >> <[email protected]> To: [email protected] >> >> From: Robert Samuel Newson <[email protected]> >> To: [email protected] >> Date: 2017/02/15 05:38 AM >> Subject: CouchDB Summit Notes >> >> ________________________________ >> >> >> >> Hi, >> >> A group of couchdb folks got together recently for a 3 day session (Feb 10, >> 11, 12) to discuss the couchdb roadmap. In attendance; >> >> Russell Branca, Paul Davis, Dale Harvey, Adam Kocoloski, Nolan Lawson, Jan >> Lehnardt, Gregor Martynus, Robert Newson, Garren Smith, Joan Touzet. >> >> We took turns taking notes, and I present those notes without editing at the >> end of this email for full disclosure. This effort represents the beginning >> of the effort to define and execute a new couchdb roadmap and all decisions >> will be made on this mailing list and/or in JIRA as per normal ASF rules. It >> was enormously helpful to get a cohort together in one space for a few days >> to thrash this out. >> >> Friday and Saturday was primarily wide-ranging discussion and >> Saturday/Sunday we got into more detailed conversations of the designs of a >> few key new features we want to focus on for the next major release of >> couchdb. I will summarise those two efforts first and the copious raw notes >> will follow. >> >> 1. "hard" deletes >> >> As everyone knows by now, a deleted document in couchdb is preserved >> forever. They are typically small, but they do take up space, which makes >> many uses of couchdb problematic and effectively excludes certain uses >> entirely. The CouchDB attendees feel this should change, and that the new >> CouchDB behaviour should be that a delete document eventually occupies no >> space at all (post compaction). >> >> The design for this is described in the notes and we whiteboarded it in more >> detail, but the essential observation is this; CouchDB is free to entirely >> forget a document once all secondary indexes have processed the deletion and >> all replications have checkpointed past it. To allow third parties to >> inter-replicate, we will introduce an API to allow anyone to inhibit this >> total deletion from occurring. The set of secondary indexes, replication >> checkpoints and these as yet unnamed third party markers allows couchdb to >> calculate an update sequence below which no deleted document needs to be >> preserved. We called this the 'databasement' in our conversations but a more >> obvious, but less amusing, name will doubtless occur to us as we proceed. >> >> This is obviously a huge (and, we hope, welcome) shift in couchdb semantics >> and we want to get it right. There'll be a more detailed writeup in the >> corresponding JIRA ticket(s) soon. >> >> 2. role-based access control >> >> We challenged another long-standing couchdb convention; that access controls >> are at the database level only. This leads to awkward workarounds like the >> "db per user" pattern which does not scale well. We spent a lot of time on >> this topic and believe we have a feasible and efficient design. >> >> In brief, role-based access control will be available as an option (perhaps >> only at database creation time, it's TBD if it can be toggled on or off for >> existing databases). A document must be marked with the roles that allow >> access, and users must be granted matching roles. We'll explicitly support a >> richer control than mere matching, it will be possible to require multiple >> roles for access. To make this work we will build an additional index for >> efficient access control. >> >> There's much more to be said of the amazing discussions we had at the summit >> and I'll defer to the following notes for that, feel free to ask questions >> in replies to this thread. >> >> Robert Newson >> Apache CouchDB PMC >> >> >> CouchDB Summit Notes >> >> 1. Testing >> • Shared test environment >> • PouchDB test suite has value for CouchDB, but isn’t modular >> • Outcome: Create unified modular pouchdb test framework to use in CouchDB >> 2. Simpler API >> • No more read API’s >> • Dale: Lots of ways of doing read/writes - but no recommended method >> • Nolan: Differentiate between api endpoints that user’s use and what is >> needed for replication, think /db/_replicate hiding all replicator specific >> calls >> • Nolan: Mango covers some APIs - get/all_docs/query >> • rnewson: Map/reduce is never going to be removed >> • SQL? >> • Separate the replication endpoints and bulk endpoints from ‘normal’ >> endpoints >> • rnewson: Moving to HTTP/2 need to reassess which endpoints are necessary >> • Outcome: Did we have an outcome here? >> 3. HTTP/2 >> • Jan: Cowboy not the best maintained project >> • Jan: Need to touch base with Cowboy author first >> • rnewson: value of Cowboy is rest module is already there >> • Garren: Elixir stuff is all built on top of Cowboy >> • Jan: Author of Cowboy is sponsored, maybe IBM can help to influence >> direction >> • rnewson: it’s possible, but CouchDB project shouldn’t have a hard dep on $ >> • Jan: there is no possibility of starting from scratch >> • Rnewson: we’ve never written middleware between us and mochiweb >> • Jan: it doesn’t do HTTP/2 >> • Rnewson: we get that for free if we switch to Cowboy >> • Jan: Cowboy is still in development, still introducing breaking changes >> • Jan: Will talk to Loic >> • State of HTTP2 >> • Erlang compat / deprecation >> • Collaboration model >> 4. Packaging (detour) >> • Nolan: managing Erlang versions is really hard today >> • Rnewson: maybe we should deprecate the whitelist >> • Nolan: nah better than failing for weird reasons at runtime >> • Nolan: custom PPA? >> • Jan/Rnewson: would need lots of resources for that >> • Rnewson: ideally the bad Erlang version problem should be going away >> eventually; the new releases are all good >> • Jan: can we run tests against pre-release Erlang versions? Need Cloudant >> scale traffic though >> • Rnewson: instead of having rebar config whitelist we could instead have an >> Eunit test that tests for the specific versions we know are broken, outright >> fails, tells you why. Maybe a blacklist rather than a whitelist. All the >> broken versions are in the past >> • Rnewson: we mostly figured this stuff out in production >> • Paul: also from reading the Erlang mailing lists, e.g. Basho folks >> discovering bugs >> • Jan: so we could try to Canary this but it’s unlikely to work well >> • Rnewson: Erlang themselves have improved their practices. From now on we >> can say “we support the latest n Erlang releases” >> • Rnewson: Could ask Ericsson to run CouchDB against their pre-release >> Erlang versions. >> 5. Sub-document operations >> • Jan: this is a feature request since forever >> • Jan: what won’t change is that we need to read the whole JSON document >> from disk >> • Rnewson: we do see Cloudant customers wanting to do this >> • Jan: how does this work in PouchDB? Does json pointer map? >> • Nolan: this can be done today, slashes or dots don’t matter, but dots are >> more intuitive. Angular/Vue does that >> • Jan: Ember too >> • Nolan: I wouldn’t worry about this, just do whatever makes sense for Couch >> and Pouch can add sugar >> • Garren: we can design this incrementally >> • Jan: let’s postpone the bikeshedding >> • Garren: would this fit with Cowboy? >> • Rnewson: yeah quite well, the paths into the objects map well >> • Garren: and this would just be used for getting some of the content from a >> doc >> • Rnewson: you’d still have to do the rev >> • Rnewson: we could add that today >> >> 6. GraphQL >> >> • Best done as a plugin >> >> 7. CouchDB and IBM/Cloudant Collaboration >> >> • A community manager to help remove roadblocks >> • Gregor: could be cross-project: Couch, Pouch, Hoodie >> • Concern around if Cloudant doesn’t work on it no-one will >> • Jan: weekly news taught me there is a *lot* going on in Couch land >> • Need more design-level discussions to happen in the open >> >> 8. Move _-fields out of JSON >> • add future proof top level `_meta` (final name TBD), for future meta data >> extensions >> • Paul: adding any field means we have to think about replication to old >> clients because they throw errors for any unknown _ field. But once we have >> _meta we’ll be in a better place >> >> 9. VDU in Mango >> • Yes. >> >> 10. Rebalancing >> • Jan: this is Couchbase’s killer feature >> >> 11. Proper _changes for views >> • Jan: the idea is you can listen to a subset of changes to a database >> • Jan: could be a basis for selective replication >> • Jan: Benoit wrote this as a fork for a customer >> • Garren: with HTTP/2 can you stream the changes? >> • All: yes >> • Adam: there are so many implementations of this that allow every possible >> use case >> • Adam: can we just say “I want live refreshes?” >> • Jan: let’s punt on this >> • Rnewson: we know when we’re updating a view, we could just send that out >> • Possibly create a “live view” basically like a changes feed but with out >> getting the guarantee of getting a full history but rather an update since >> you joined. >> >> 12. Redesign security system >> • Gregor: what initially drew me to Couch was that it had auth built-in, >> very nice for building simple apps. As we made progress though we found we >> just rewrote auth ourselves and used CouchDB only for DB >> • Rnewson: hard to converge, no one clear winner >> • Adam: implications on the Cloudant side as well. Cloudant needs to align >> internally with the IBM Cloud Identity & Access Management system. Need to >> be careful about chasing a moving target. >> • Rnewson: Cowboy does give a clean API for this >> • Nolan: big thing with current auth is that users frequently get stuck on >> basic rather than cookie auth >> • Garren: how to do offline-first auth? >> • Nolan: no good answer right now >> • Gregor: frequently users get invalidated towards the backend >> asynchronously. Users continue working locally, become un-authenticated. >> It’s more of a UI problem >> • Gregor: ideally users don’t need to login at all to use the app. Can use >> it locally w/o login >> • Jan: How does this impact Pouch? >> • Nolan: something streamlined would make the most sense, but it’s okay if >> it’s not on by default. Users should be able to get up and running quickly, >> but also graduate to a production app >> • Rnewson: we’re also planning to do secure by default >> • Jan: it should be easily tweakable whenever something new like JWT comes >> along >> • Garren: What does Mongo do? >> • Jan: No other DB does this >> • Rnewson: Or nobody uses it >> • Jan: Like password forget. Doesn’t exist >> • Adam: no upside to increasing complexity on DB backend >> • Gregor: It’d be good to have a simple lowest-common denominator >> (user/pass) in order to get going. Everything beyond that… e.g. how do I get >> a session without a password? In Hoodie we implemented CouchDB’s algo in JS >> • Rnewson: well we’re not going to do unsecured by default anymore >> • Adam: most of this will probably not make it into Cloudant’s API >> • Garren: Given we don’t have 1000s of devs, it does help us not have too >> much on our plates >> • Adam: if there were a plugin to make this easy… in any case it shouldn’t >> be the focus of Couch >> • Nolan : it does play to couch’s strengths - the “http database” >> • Jan: yeah not trying to diminish but it would be a lot of work… but Hoodie >> should not have to do so much work >> • Adam: agree… could smooth over rough edges >> • Nolan: maybe should wait for the plugins discussion then >> • Break time >> >> 13. Mobile-first replication protocol >> • Jan: When replication protocol was first designed this wasn’t really a >> concern >> • Jan: HTTP/2 fixes many of these issues >> • Jan: may also need a way to do tombstone-less revisions >> • Nolan: 80/90% of problems solved by HTTP2 >> • Testing would be great to have a Docker image with a HTTP2 proxy >> • No improvement would not mean not worth doing. >> • Revisit >> • Nolan: primary use case for PouchDB is mobile, poor network conditions. >> Currently HTTP 1 algo is very chatty, users complain about it >> • Nolan: Need to validate with an HTTP/2 wrapper to see improvement. >> • Rnewson: Doesn’t disprove though. But might prove benefit. >> 14. Tombstone-less replication / tombstone deletion in database >> • Rnewson: we see this a lot with Cloudant, often folks don’t want deletions >> to replicate. It’s there for a good reason, there’s a massive use case, but >> it shouldn’t apply to everyone. There are use cases where people want to >> delete data from a database. We’re starting to implement some stuff already >> in Cloudant. Would prefer for it to be in CouchDB. We’re implementing a >> clustered version of purge, currently only way to do this. Might be driven >> by views. It’s hacky. Need a solution where we say “don’t need to sync any >> further.” >> • Gregor: revision history is different from purging >> • Rnewson: what we’re doing is making purge first-class >> • Gregor: from our position purging is a major thing. We just don’t do it. >> If you want to end the replication and then share… it’s a problem. Purging >> is what we need. >> • Nolan: we don’t implement purge in PouchDB >> • Rnewson: new one is just a clustered version of old one probably. Needs to >> be safe across shards. >> • Nolan: we don’t implement it because it’s hard. Have to make sure views >> don’t show purged data >> • Rnewson: similar reasons it’s hard across shards >> • Rnewson: need to let others interact and not impact by purged data >> • Jan: replication could automatically skip tombstones >> • Rnewson: should be able to add checkpoints and say ignore deletions before >> this >> • Chewbranca: what about replicated purge log? >> • Rnewson: that’s what this is >> • Chewbranca: exposed as a clustered level? >> • Paul: no. not impossible to add it, but counterintuitive. Kind of out of >> scope. Didn’t get into it. Changing the external HTTP replicator frightens >> me. Lots extra to do. PouchDB compat etc. >> 15. Auto conflict resolution >> • Jan: People don’t like writing conflict algorithms >> • Paul: Hard when people delete data and then it replicates back. Hoping >> purge will help. >> • Rnewson: question is: how to do such that it doesn’t cause conflicts >> between peers. Given same input need same output >> • Jan: and no loops >> • Rnewson: yes needs to converge >> • Jan: CRDTs? >> • Rnewson: yes >> • Rnewson: idea is that things in CouchDB would have defined solution to the >> conflict problem >> • Rnewson: could say if you want to do it, conflicts need to be first class >> • Jan: there’s a paper on this. Read it on the flight. Nice attempt but not >> production ready. Unbounded index problem. >> • Chewbranca: CRDTs aren’t suitable to all data types. Typo in two places in >> same doc, CRDTs can’t tell you what to do >> 16. Conflicts as first-class >> • Paul: conflicts as first class talks about conflicts as a graph instead of >> a tree. Resolution may introduce conflicts. I’d like to be able to fix those. >> • Gregor: what we’ve seen the most is one revision is a delete… should >> always win. Weird when you delete something then it comes back due to a >> conflict. >> • Rnewson: idea of graphs is you say they join back… not just branching trees >> • Adam: you deleted one branch in server, one in client. Deletion trumps >> • Adam: it’s like a git merge. Person working on branch can keep working. >> Changes from “undeletion, what the hell?” >> • Chewbranca: need easier way to expose this to clients… conflicts as first >> class doesn’t discuss this >> • Rnewson: I think it’s an unpleasant experience when conflicts are hidden >> until they scream. Exposing them is worse. Focus on things that let you >> actually resolve the situation once you encounter it. >> • Chewbranca: we need to be more aggressive. User can use couchdb for a long >> time and never know about conflicts. Never have that info exposed. >> • Jan: what about always including conflicts >> • Rnewson: nice middle ground. At least you’d notice the problem >> • Nolan: isn’t this going into Fauxton? >> • Rnewson: I think it is. But it’s also just a nudge >> • Gregor: we are ignoring conflicts mostly. But it’s a good feeling that >> we’re building something that does everything right so we can handle >> conflicts later. Something I’d wish is to make it easier to register >> app-specific conflict resolution algos. E.g. deletion always wins. >> • Rnewson: need to decide whether we just get rid of conflicts. I think >> we’re stuck with them >> • Chewbranca: what if you’re not doing backups >> • Rnewson: That’s not where we’re going… want to give more capabilities to >> handle it. There’s the graph so we can contract the tree, and then some >> javascript function to do automatic conflict resolution. Maybe some >> built-ins like we have for reduce. With enough docs this could go away. >> • Chewbranca: it’s also easy to make that conflicts view. Should promote >> cron jobs >> • Rnewson: don’t want to say “our killer feature is multi-master >> replication… btw don’t use it” >> • Chewbranca: I mean make it easy for the user to find it >> • Jan: will revisit this >> • Rnewson: the fundamental pieces are right. We replicate around until >> something resolves it. We’ve done very little around tooling and support and >> visibility. >> • Chewbranca: Configurable “max conflicts”? 10 conflicts and you’re done? >> >> 17. Selective sync >> • Jan: selective sync is also big topic >> • Jan: typical thing is “I want to build Gmail in Couch/Pouch but I can’t” >> • Jan: something could get archived after n days >> • Chewbranca: you could accomplish a lot of that with view filters for >> replication >> • Chewbranca: timestamp view to query for a particular month, replicate this >> view >> • Paul: I was thinking of it as a filter more than as a view. As Dale >> pointed out we can do it with filters >> • Chewbranca: replication always walks entire changes feed >> • Nolan: does that handle “sliding window”? >> • Jan: yes >> • Gregor: What about archiving? I always thought we could make this work >> using filters. I could say “archive every doc with that property” and those >> docs wouldn’t be synchronized, but someone said it doesn’t help because it >> still starts from the beginning. >> • Adam: you also want to start from newest and go back >> • Nolan: this affects npm replication… old packages replicated first >> • Adam: we see this with gaming as well. Oldest stuff replicated first >> • Adam: what we’re realizing is that this particular “new to oldest” >> replication is generally useful >> • Adam: could also stop when we get to a certain age >> • Rnewson: or mango… as soon as it stops matching, stop replicating >> • Adam: not saying reading changes backwards is easy… some tricks >> • Gregor: another thing, let’s say I open my email, first thing I want to >> see is an overview, the subject and the first few characters, >> meta-information. Usually this would be a view. But I want to sync this view >> first and show it while the sync is still going. >> • Jan: view could be server-side >> • Jan: could be in a library though, not in core Couch >> • Chewbranca: you can do this today if you know the update sequence. The >> tricky bit is discovering the update seq. Another option is make it easier >> to find that seq. >> • Nolan: can do this today by forking pouchdb-replicate package >> • Adam: hardest is how to resume >> • Chewbranca: could do newest to oldest with a limit, only fetch the first >> 1000 >> 18. Database archival >> • Jan: Database archival… ignore this for now >> • Chewbranca: want to export shards >> • Joan: lots of people want rolling databases, or create a new one every >> month, have to update their apps to use different db names. Could also solve >> that problem. Old stuff goes away, aka sliding window. >> • Adam: you got it >> • Rnewson: so it partitions the database in some way and then that drops out >> of the live db? >> • Joan: let’s not bake in a specific backend. Could have scripts for that >> • Jan: need a format like mysqldump >> • Jan: I like the idea of streaming that to a new DB >> • Jan: In a couchdb 1 world can just read the couch file >> • Adam: the use case Joan described is a telemetry store. Recent data in >> Couch. Want to keep it but cheap files on disk. That’s a continuous process. >> Like to have a TTL on the document. That’d different than just exporting the >> DB >> • Chewbranca: should talk about rollups for telemetry. Metadata, hourly. >> Very common feature >> • Adam: less common. I don’t think Couch should try to do it >> • Chewbranca: I think it’s a good feature. But we can skip it >> • Jan: let’s skip for now >> • Jan: we agree we want something. Got some ideas >> • Adam: “streaming” archive of documents that have outlasted the TTL may be >> a different thing than a one-shot bulk archive. Both could hopefully use the >> same format. >> 19. DB update powered replicator >> • Jan: replicator database… not everything needs to be live until written to >> • Rnewson: problem is the scheduler? Might define 1000 jobs at once? We’re >> working on that. Big project we started just before 2.0 was out the door. >> Started in the Apache repository. >> • Adam: Jan’s also talking about being able to drive replication to a large >> number of DBs >> • Rnewson: it’ll decouple the declaration of replication doc from when it >> runs >> • Rnewson: should include Jan in db core team to talk about scheduler >> • Rnewson: “replication scheduling” maybe? >> • Gregor: I’d like to have 10000 databases with 10000 defined replications >> • Rnewson: exactly the problem we’re tackling >> • Rnewson: scheduler has a thing where it examines work and sees if it’s >> worth running, e.g. if something hasn’t changed. It’s not that smart yet >> • Chewbranca: event-based push replication? How about that? >> • Rnewson: perhaps, it’s in the roadmap. Say you’re Cloudant, we have lots >> of accounts, every cluster gets its own connection, that’s silly >> • Chewbranca: yes but also could incorporate into doc updates. If there are >> outgoing replications, post out directly >> • Rnewson: I dunno >> • Rnewson: that’s what the db updates piece is. There and then it tells the >> scheduler it’s worth replicating >> • Rnewson: we care about wasted connections, resources. Want to avoid >> situation where database I’ve hosted somewhere if it hasn’t updated I won’t >> replicate. Stop those jobs entirely. Timebox them >> Consistent databases >> • Jan: consistent databases, will skip >> • Rnewson: let’s talk about it >> • Adam: databases that never have conflicts. Only exactly one version of >> document >> • Rnewson: strong consistency >> • *not sure*: Opposite of what Couch does today then? >> • Garren: like RethinkDB >> • *crosstalk* >> • Jan: Nick brought this up >> • Rnewson: what you do need is eventually consistency. 10 nodes is 10 >> separate configs >> • Chewbranca: lack of eventual consistency is real problem >> • Rnewson: can solve that as with the dbs database >> • Adam: we have such weak query capabilities across databases. If it were db >> level it might be a fairly common use case, 99% of the records in particular >> DB can be eventually consistent. Documents with particular attributes could >> be targeted. Could push it to the doc level >> • Adam: could guarantee for certain docs with certain attribute that they >> never have conflicts >> • Chewbranca: I think it’d be good for the dbs db to be consistent. One way >> or another that’s a major problem. Conflicts in the dbs db is terrible >> Pluggable storage engine >> • Jan: next: pluggable storage engine >> • Paul: almost done. Need to test with before and after pluggable storage >> engines in the same cluster for rolling reboots. Been a day and two time >> zones since I looked at it. Had a bug in the test suite. Getting ready to >> pull the mega PR button trigger. >> • Paul: been confusing recently. Basically it’s a refactor of the internals >> to give us an API. No new storage engine. Alternate open-source >> implementation to prove it’s not over-specified. Merging would create a >> couple config things. Goal is to let people play with it. No new storage >> engine. No changing of data. All old dbs still work fine. >> • Jan: if we can document this well we can get lots more Erlang folks >> • Nolan: we do this in Pouch, it’s not recommended though to use Mongo etc. >> • Joan: is Paul’s thing open source? >> • Paul: I used a nif (?) to do the file I/O, couple file optimizations, want >> to minimize number of times we write doc info to disk. Uses three files per >> DB. Took me two days. Close to the legacy storage engine but sufficiently >> different to prove API isn’t overly specified. Will show in PR. Couple >> corner cases. >> • Jan: opportunity for a storage engine that doesn’t trade everything for >> disk space, but has the consistency. May be okay for certain types of uses. >> • Paul: lots of cool things to play with. Per-database encryption keys >> instead of filesystem encryption. In-memory for testing and playing >> • Jan: please >> • Paul: as soon as we have the API we can do lots of things >> • Garren: interesting to compare to Couchbase >> • Rnewson: when will the PR be ready? >> • Paul: hopefully next week. Need to rebase. Test suite passes. Want to set >> up a cluster with and without just to make sure. Set up a mixed cluster. >> • Adam: need to do something about attachments? Needs to store arbitrary >> blobs. >> • Paul: for attachments that is the only optional thing I wrote in to it. If >> you have a storage engine that doesn’t store attachments you can throw a >> specific error. Otherwise it’s an abstract API mimicking how we do things now >> Mango adding reduce >> • Jan: Mango adding reduce >> • Jan: goal is to add default reducers to Mango >> • Rnewson: isn’t the goal with Mango that it looks like Mongo? >> • Adam: doesn’t have to >> • Garren: we keep seeing Mango/Mongo, is there a different query language we >> want to do? >> • Jan: database query wars? >> • Jan: either mango or SQL >> • Nolan: been pushing Mango for IDB at W3C. tell me if you hate it >> • Rnewson: don’t hate it, but goal is to make Couch more accessible >> • Jan: I’m totally fine >> • Rnewson: just saying are we cleaving to this. Is this why Mango exists, >> because of Mongo? >> • Rnewson: similar but not identical is okay >> • Chewbranca: reason we don’t want to promote Mango? >> • Jan: we’re doing it >> • Adam: it’s got a bunch of traction behind it >> • Chewbranca: if it’s good enough, we should go with it >> • Garren: it can only do a certain number of docs though? >> • Paul: there is a limit. We talked other day about sort. There will be a >> limit for that as well. Biggest downside of Mango is people think it’s >> smarter than it is. Comes from Mongo. E.g. their sort has a 32MB cap. Works >> until it doesn’t. >> • Jan: this is about declarative form of existing reduce >> • Jan: mango is currently only a map function >> • Garren: best way to learn how people use Mango is to see pouchdb-find >> issues. People start using it, then they ask questions. Once you know Mango >> is map/reduce with sugar then you kind of get it. But if you don’t get it >> you struggle. Making it more intuitive saves me time. >> • Rnewson: yeah people assume it’s like Mongo >> • Jan: why isn’t this the primary way to interact with Couch. We need to >> take it seriously >> • Rnewson: we should be saying that >> • Jan: we had an out-of-the-blue contribution to this recently >> Break >> • Some talk about Couch-Chakra: seems Chakra would be easy to embed, runs on >> Windows (ARM/x86/x64), Ubuntu (x64), MacOS (x64): >> https://github.com/Microsoft/ChakraCore >> Mango: adding JOINs >> • Jan: fake the pattern of joining documents, once we have that (foreign doc >> idea) could also have a foreign view key >> • Jan: linked document is the first thing >> • Chewbranca: could potentially get view collation in Mango as well >> • Nolan: is there anything in MR we don’t want to add to Mango? >> • Jan: arbitrary JS, aside from that… >> • Nolan: people do want computed indexes though >> • Jan: people want to define a mango index that you can run a query against, >> but what goes into the index is limited by the expression… people want to >> write js/erl etc. with a map function but they query with mango. Given they >> use the right schema to produce the right data >> • Paul: mango applies start/end key automatically. When you get into >> computed I don’t know how that would work. Could scan the entire index or >> alias it >> • Paul: key is an array with the secret sauce that maps to an index. >> Selector has “age greater than 5” it knows that the first element of the >> array is age >> • Jan: whatever the custom map function is >> • Nolan: issue 3280. Garren you didn’t find this compelling? >> • Garren: not sure what they’re trying to achieve >> • Jan: regardless there’s a bunch of stuff we can get to later >> Mango result sorting >> • Jan: result sorting >> • Jan: question is I’m reducing a thing, sort by the reduce value. Standard >> databasey stuff. Problem of being an unbounded operation. Current policy is: >> CouchDB doesn’t have any features that stop working when you scale. Current >> non-scaling features are no longer being worked on. Want to be more >> dev-friendly though. >> • Garren: problem is it’s cool, people will use it, but you have to limit >> the results. But if you don’t put it in people will get frustrated >> • Paul: algorithmically we don’t provide a way to do it inside of the DB. >> Nothing we can do inside of the cluster user can’t do locally. Sort by age? >> Can’t paginate through that. Have to buffer the entire result set. Only >> pertains when you don’t have the sort field indexed. >> • Rnewson: all these Mango additions are features that work cosmetically. >> Not a Mango thing so much as a view layer thing. Needs to scale. If I have a >> million things in my view, and I want it sorted by another field, don’t want >> server to run out of memory. Or we don’t do it. >> • Chewbranca: why not make another view? >> • Rnewson: what you find in other DBs is new kinds of index structures. >> Elaborate. All the magic. We don’t do any of that. Maybe we’re getting away >> with it and that’s a good thing >> • Jan: classic example is game scoring >> • Rnewson: want to sort by value >> • Rnewson: maybe multi-view, probably a new format. >> • Nolan: maybe changes on views solves this? Then the view is a DB, can do >> views of views >> • Rnewson: problem of tombstones though. Also we said we wouldn’t do this >> • Nolan: that’s how pouchdb map/reduce works today >> • Jan: could cover this in the tombstone discussion >> • Rnewson: having it as replication source, that’s a sharp end of the impl. >> There’s a lot less we could do for changes feed of the view >> • Chewbranca: nothing preventing us from doing double item changes >> • Rnewson: there’s bits and pieces for this that we said we’re not going to >> do because it’s complex, so should excise >> Bitwise operations in Mango >> • Jan: bitwise >> • Jan: user request, like a greater than, equals, you also want bitwise >> operations >> • Rnewson: fine >> NoJS mode >> • Jan: noJS mode >> • Jan: consensus around table already probably >> • Jan: would be nice to use 80% of CouchDB without JS, using Mango as >> primary query, won’t include document update functions >> • Jan: update functions will go away because we have sub-document >> operations. Just query and validation functions >> • Joan: one of the things I’ve seen update functions used for that this >> proposal doesn’t resolve is server-added timestamps to the document >> • Jan: not a fan. It’s still optional. You don’t have to go through the >> update function to put the document. App server can do this >> • Rnewson: is it important for noJS mode? Could make that a feature. >> • Joan: it is certainly one option. Been talked about a lot. Auto timestamps. >> • Garren: what about clustering? >> • Rnewson: not important for multi-master replication >> • Jan: have different TSs on different nodes >> • Rnewson: has to be passed from coordinating node down to the fragment >> • Jan: there’s a Couchbase feature coming where the clients know which shard >> to talk to, reducing latency. Same could be done for us >> • Chewbranca: tricky thing here is supporting filter functions >> • Garren: can’t do it with mango? >> • *crosstalk* >> • Jan: can do that already >> • Rnewson: not doing the couchappy stuff >> • Garren: need a baseline. What do we need that can run without JS? >> • Jan: Joan, open an issue for autoTS? Also automatic recording of which >> user created it >> • Joan: you got it >> • Rnewson: if they’re always available they’re not usually huge >> • Chewbranca: tickets for removing old features like the proxy stuff? >> • Jan: add it to the list… like killing couchapps >> • Jan: question was how do we get rid of old stuff we don’t want anymore. >> • Rnewson: need a list >> • Jan: Chewbranca, open a ticket >> • Chewbranca: ok >> • Jan: couchapps, all the things you mentioned >> • Rnewson: had customers at Cloudant as for this. Server-side timestamps. >> Little bit of useful metadata >> Query server protocol v2 >> • Jan: query server protocol v2 >> • Jan: If we switch to embedded Chakra this goes away >> • Jan: while the JS is going, could load the next batch from this. If it’s >> in-beam we can think of parallelizing this. One Chakra per core. >> • Rnewson: competing desires here. Building one view faster is one use case, >> many views is another >> • Jan: this is about external views, JS views. Should be a streaming protocol >> • Rnewson: clouadnt query, chakra core, maybe it’s not an extendable thing >> anymore. Don’t go out of our way to make it non-extendable. >> • Garren: so you’d never be able to write a map/reduce in Python >> • Paul: I’d like to see it internally at least have a Couch os proc manager, >> remove that and put it in its own app >> • Paul: That way, we could embed V8 or Chakra, but we could clean up >> internally so we aren’t… native language implementation is happy. Send and >> receive protocol for OS processes. I’d like to clean that up. That way you >> can say as you generate your release, hey for JS I’m going to use the V8 >> engine embedded. Everyone with me? >> • Joan: I think the key here is that we’re talking about this as no longer >> being an externally consumable API. behooves us to write an API we can >> understand and use. Never gonna call out in the documentation here’s an >> external view server. >> • Paul: think we should document it, make it easy for people to do things. >> If someone wants to write a Python adapter should be possible to do. Should >> be as easy as including a Github repo. >> • Jan: “legacy plugin” >> • Paul: right. Include arbitrary languages by changing doc config >> Full text search >> • Jan: full text search >> • Jan: should ship as part of Couch >> • Rnewson: we can do that, we need to flush it to ASF. caveat: has no >> community around it. The components that make it work are abandoned. Given >> up faking JVM semantics inside of Erlang. Don’t know if it has a lot of life >> left in it. It works. But if we have any issues with those components it’s >> going to be on us. It’s like mochiweb. Instead of one person we end up with >> zero. >> • Jan: for now I’m fine with running this for awhile >> • Rnewson: getting care and feeding but people do hit edges >> • Chewbranca: not going to get rid of it though? >> • Rnewson: not unless we can replace it >> • Chewbranca: it’s the best thing we’ve got today. There are merits to >> including it >> • Rnewson: can’t use JDK 7 >> • Chewbranca: there are problems, but if it’s incorporated by default >> • Rnewson: only JDK 6 >> • Jan: on lunch, overheard discussion of getting rid of JVM >> • Rnewson: I’d love that. Prime reason we have it is FTS. don’t see an >> alternative >> • Chewbranca: there’s sphinx >> • Rnewson: it’s a toy >> • Jan: SQLite? >> • Paul: think we should focus on making it easier to use. There are people >> like me who don’t like running a JVM. I’m okay running it at Cloudant >> because it’s the least worst thing >> • Jan: alternative today is running Solr or ElasticSearch. Running the JVM >> anyway. >> • Paul: if you read the instructions in that ticket it’s kind of insane. We >> don’t notice it because we’ve done it in Chef once >> • Paul: if we go through and make these sorts of things easier to plug in. >> If we can get to there, we don’t have to care… rebar pulls them down >> • Rnewson: implemented as a pluggable option >> • Chewbranca: caveat is we’re talking about FTS to empower more features in >> Mango. Do we want FTS to build Mango with? >> • Rnewson: not sure we’ve come to that yet >> • Chewbranca: how complicated is it to have conditionals in the Mango code >> • Rnewson: sat down with Kowalski 3 years ago to talk about this >> • Joan: the Lucene project has links to a number of impl’s in other languages >> • Rnewson: long and sorry history >> • Rnewson: none of them are as feature rich. They all die >> • Jan: that list is there to just inform people >> • Rnewson: Lucene has eaten that whole space >> • Rnewson: I’d love to build an Erlang version if we had the time. Just wish >> Lucene wasn’t using Java >> • Chewbranca: is Sphinx really that bad? >> • Joan: yes it chokes up on multiple requests in a queue. It’ll hang >> • Rnewson: look at Riak they tried a few FTS and Lucene is the only game in >> town >> • Rnewson: I should have written it in Java rather than Scala so that it can >> work in new JDKs >> • Jan: we’re interested in getting this feature in Couch >> • Rnewson: there’s a very small community that has made some contributions >> Rebalance >> • Jan: rebalance is a poorly named thing >> • Jan: couchbase has this really nice feature, you have a 3 node cluster >> running, you put in the ip address, it rebalances, you see a progress bar, >> now you have a 4 cluster node. Can keep going and automate that. Their >> sharding model is simple enough for them to do. Dunno if they do it with >> queries. Needs to be recalculated somewhat. Their storage model is >> in-memory, relatively fast. >> • Rnewson: hard part is coordinating reads and writes for this process >> • Garren: don’t they have a lot more v buckets (?) >> • *crosstalk* >> • Garren: aren’t they moving shards around? >> • Rnewson: let’s stop saying “rebalance” >> • Jan: what they’re doing is moving the shards around >> • Garren: I chatted with Mike Wallace about this. Our q factor is too low. >> We need more shards so that we can distribute it more. Cassandra starts at >> 256. >> • Rnewson: if we could do that without downsides, then rebalancing is just >> moving shards. We have something in place to do this in a live cluster. >> Unpleasant and slow, but it works >> • Joan: did some benchmarking at Cloudant about q cluster. If q was too big, >> perf suffered. It’s because of some assumptions we made on number of >> processes running and disk I/O. >> • Paul: single doc operations scale well. Anything that aggregates across >> multiple shards scales >> • (break) >> • Chewbranca: tricky bit is shuffling the data around. If you’re caching the >> key rather than the id then you need to send the result differently >> • Adam: it’s not about redoing the architecture of the database. If you add >> a database today and change the partitioning in most cases you are redoing >> the index >> • Chewbranca: where we increased the cluster capacity and redid the database >> • Rnewson: we don’t have to do all of that… can go from 3 nodes to 12 >> • Jan: exercise tomorrow >> • Rnewson: I’d like to see the ability to move the shard around >> • Garren: two step process >> • Rnewson: can reach a couchbase-like solution >> • Rnewson: ability to move shard around while accepting reads and writes >> would be huge >> Setup >> • Jan: still terrible >> • Jan: mostly Fauxton work >> • Chewbranca: seems like a chicken and egg problem >> • Garren: can only do a setup, then once it’s setup you can’t say “here’s >> another node” >> • Garren: if you have three nodes and you only set up 2, you need to start >> again >> • Adam: I’m biased by Kubernetes. There are elegant solutions in specific >> environments. Trying to solve all problems in a generic Couch construct is >> not a bad thing to take on, but concerned about our ability to... >> • Jan: if we do a homegrown thing, should be higher level >> • Jan: this is where we come back to the single node story >> • Adam: do you not see increased adoption of Docker for development? >> • Garren: I love Docker >> • Jan: not a fan >> • Jan: whole idea of containerizing and standard containers that’s really >> nice but in the apt-get crowd… >> • Jan: Greenkeeper is Docker. It’s easy and it works >> • Garren: what do we want to do? Get the amateur up and running with a 3 >> node cluster? >> • Jan: we have that. Little bit past amateur. No chef, no puppet, no AWS >> • Rnewson: huge number of people who would run a Couch cluster without using >> tools? >> • Garren: people running on Windows, yes >> • Rnewson: I can see not going with any specific thing, Kubernetes, Chef, >> Puppet, we’d be taking that whole thing on >> • Jan: doesn’t have to be the whole thing >> • Rnewson: how far do we go? >> • Jan: very clearly say: if you want to run professional Ansible, or people >> already know they need to use that >> • Garren: this discussion links up to config. Can we link those two together? >> • Background is Cloudant obviously already runs this >> Cluster-aware clients >> • Jan: in Couchbase they provide client libraries that know which node to >> talk to so they don’t have to proxy through one node. Lower latency access. >> They have impressive numbers. Guaranteed sub-10ms latency, they can do that. >> But you can only do that with cluster-aware clients. Saves a hop. >> • Chewbranca: to enable, need shard-level access. Could skip quorum. >> • Adam: I don’t think we’d support this in Cloudant anytime soon. It makes >> it harder to run our tiered service. There’s SSL termination whatnot. We >> also have this compose property. As cloud service providers this one could >> possibly solve problems. I don’t think the numbers back up the savings. >> Where you could think about it is there are endpoints that the overhead of >> aggregating the results and delivering one result is problematic >> • Joan: talked about this with replication >> • Adam: anything that consumes changes feed, merging that changes feed and >> generating that is more expensive. If goal is to get access to all contents, >> avoiding that aggregation >> • *crosstalk* >> • Jan: moving on is fine >> • Rnewson: I’d like to solve that particular problem >> • Rnewson: I think we can do replication that doesn’t put that burden on the >> client >> Fauxton >> • Jan: I don’t like how it looks. Trying to query a view, thing pops up, I >> want to change something, it pops up. I want to live change something on the >> site, my main point is information density. Garren can share some aspects of >> the design >> • Garren: Justin has done a nice job. Made a mistake in the beginning with >> two tabs. Biggest mistake we’ve done. It’s limited how much window we have >> to play with data. That’s our big push, move from the sidebar to tabs, so we >> get more space >> • Garren: Justin is working on this internally, want to open it up for >> community discussion >> • Chewbranca: I like the information density view when you’re dealing with >> rows of data, can pretty-print >> • Garren: we should default to a table look >> • Jan: moving on >> • Garren: best thing is review the design and take the feedback >> Releases >> • Jan: last time, talked about this with Noah >> • Jan: nature of changes should define the version number, define what major >> versions are >> • Jan: need to be careful, can’t have 10 major versions per year >> • Jan: one per year is fine >> • Jan: one major feature per thing. Shouldn’t be that bad >> • Rnewson: missing people to actually do that work. Lots of useful important >> features >> • Chewbranca: do we want to have an always-releasable branch? Tricky to do >> when everything flows into master >> • Rnewson: we had talked about test suites, those go hand in hand. Can’t do >> a release without a comprehensive test suite >> • Chewbranca: how much of our tests are Cloudant private? >> • Rnewson: coverage for search, geo, other things. >> • Chewbranca: introduces a regression on query, dropping down from 8 to 4k >> • Rnewson: we removed that. Took the code out >> • Rnewson: idea is that the PouchDB test suite is the most comprehensive >> thing out there >> • Rnewson: Cloudant has the same problem, has comprehensive review, what’s >> involved before it hits production. There’s a lot of manual intervention in >> between because there’s a lot of stuff we don’t trust >> • Jan: need to take the magic out. >> • Joan: at one point we brought somebody in, they didn’t get support. They >> rage quit. It’s not changed a lot over the years. Feels like a good way to >> give back. >> • Chewbranca: how much of the test suite runs against the clustered API? >> Testy’s not in CouchDB >> • Rnewson: we’ve talked about open sourcing it >> • Chewbranca: all the unit tests run against cloudant 8.6 >> • Rnewson: to get releases flowing again, need to review test suites, figure >> out coverage. Not gonna do regular releases if people have to do manual >> tests. Figure out what broke. Those things take months. Needs to not be that. >> • Rnewson: CouchDB/Cloudant interaction is we have to have tests added so >> Cloudant has confidence >> • Garren: single repo would help here. Then we could do stuff with Travis >> • Rnewson: need a repo per domain of change. FTS will be separate. >> Everything else inside of… >> • Jan: if we have some time this weekend after discussing things we should >> explore this. Just need to do it. >> • Chewbranca: need to figure out what test suite we’re going to use >> • Jan: bit of a TBD. want PouchDB people to contribute. Python tests. Maybe >> move everything Erlang to unit tests >> • Chewbranca: still a case for testing clustered scenarios. Always been >> really awkward. Brian Mitchell looked at this years ago >> • Rnewson: we never surfaced it. At Dev Day was gonna do a talk on fault >> tolerance. There’s no way to demonstrate weak quorum didn’t happen. Can take >> 2 of 3 nodes down we’ll be fine. Not testable because it’s not visible. >> Internal replication stuff… Jepsen level of understanding, do they work, we >> think so. All of those things we could enable test suites like Jepsen to >> focus on those things. >> • Chewbranca: one last thing, is the game plan not to do anything with >> Quimby (?) or Testy and it’s going to be ported to the JS functions >> • Rnewson: little early for that. We run Testy and the eunit test suite, JS >> tests. We don’t run the PouchDB tests. That’s the current barrier to >> production. If we’re going to change that and I think no one would object if >> it’s easy to run. Would get more contributors. We also talked about >> pluggability. FTS is not yet open source. Want to be able to merge tests. >> • Nolan: let’s take a look this weekend, see what you need to remove >> • Rnewson: same for Testy >> • Nolan: dealbreaker may be that we’d keep the tests in the monorepo but >> would publish as a separate package >> • Rnewson: glad you mentioned that, may be a problem >> • Garren: not going to just run the tests against Couch to prove they work? >> • Jan: would be weird not to have the test suite part of the Apache project >> • Rnewson: if we can’t get more contributors then there’s not much value >> Performance team >> • Jan: I like how SQLite did this, lots of small improvements that added up. >> Some low-hanging fruit in CouchDB still >> • Chewbranca: there’s lots of fruit there >> • Rnewson: should be significant ones >> • Chewbranca: e.g. increasing q for aggregation. Saturated CPU on list >> merge. Things like that would have a big impact >> • Garren: how easy would it be to change to something less CPU intensive? >> • Rnewson: there is a Cloudant perf team >> • Jan: Ubuntu ran a micro benchmark, very hard to do these kinds of things, >> need to have CouchDB be faster. What does a perf test in the cost of a >> cluster mean. >> • Adam: if we could redact customer data >> • Chewbranca: relatively easy to start publishing results. Could make a >> workflow to get benchmarks run. Tricky bit is majority of code is about >> bootstrapping clusters, Chef stuff, tightly entangled. Not feasible to >> open-source. Kubernetes-based would be easier. >> • Jan: using your infrastructure without having to open source it? >> • Chewbranca: that’s intruiging >> • *crosstalk* >> • Adam: lots of precedence for this >> • Jan: I can’t speak for how much time you invest but >> • Chewbranca: let’s say we want to know view query perf as function of q. >> We’ll queue up entire suite of benchmarks to change q through those, run the >> whole thing, save all results, do it all again, for each iter, replicate >> benchmarks multiple time, it’s great, but that takes 12-24 hours. So we >> wouldn’t necessarily want to put out a service that lets people do 50 hours >> of benchmarks >> • Jan: happy with that. Can be done manually. If it’s only a thing you >> control >> • Chewbranca: need to build a wrapper around that. Not too hard >> • Adam: if you think people will explore, this would be a solid investment. >> Don’t want to do a lot of work and then no one uses it >> • Jan: flamegraphs can be done later on, find a bunch of CS students, pay >> for a summer internship, but this would be microbenchmarking stuff. >> • Chewbranca: what’s the policy? >> • Rnewson: conclusion? >> • Jan: I want something like what we did for Hoodie. We put out a job ad. >> Here’s something we want. I want to get people interested. Here’s what >> you’re responsible for. >> • Gregor: worked well to create issues around low-hanging fruit and give >> some information, then say up for grabs >> • Adam: sometimes need a different implementation, you have to understand >> the whole thing >> • Nolan: PouchDB does have a perf test suite. Highlights cases where we are >> slow vis a vis CouchDB. Also comparing across PouchDB storage >> implementations (in-memory, LevelDB, IndexedDB). >> • Garren: will be useful for Pluggable Storage. >> • Russell: eager to engage more with the community here. Wasn’t quite sure >> how to navigate what to open source. >> Benchmark Suite >> • Russell: our underlying kit is simple, based on basho_bench. We could open >> source this, but real utility is in our automated runner across the >> parameter space. >> • Rnewson useful to opensource the basic framework >> • Garren: let’s provide the “here’s how to do proper benchmarking” document >> • Russell: you want a book? Lots of detail involved here >> • Rnewson: Benchmark suite vs. synthetic load suite? >> • Jan: Synthetic load suite is more about capacity planning for a specific >> customer workload, as opposed to broad-based perf measurement over time >> • Russell: we have some of this synthetic load suite in our portfolio as >> well. Nightlies are tough because our individual test runs take 10s of hours >> Synthetic Load Suite >> • As discussed above. Will be very nice to have once the benchmark suite is >> running >> Consolidate Repositories >> • Everyone agrees to do this >> CI for Everything >> • Jan: I’ve articulated a vision here, ideally want things all running on >> ASF in the future >> Schema Extraction >> • Jan: See V8’s property access optimization >> • Jan: Pointer off the DB header that has all the schemas for all the >> documents in the database >> • Jan: would be nice - “change all the values of a field for all documents >> obeying this schema” >> • Rnewson: this is what Lucene calls elision >> • Russell: what are you trying to accomplish? >> • Jan: first goal is showing all the schemas in the database >> • Adam: do you keep stats on how common these are? >> • Jan: Not yet but easy to add >> • Paul: downside is really wonky schemas, e.g. email address as key instead >> of value >> • Adam: we’ve spent quite a bit of time on this internally at Cloudant >> without getting to prod yet. Definitely an interesting topic for data >> quality issues >> • Jan: could go and explore schema migration on compaction >> • Rnewson: “X-on-compaction” gets gnarly quickly >> Native container support >> • Adam: Could include things like service discovery via DNS SRV for finding >> cluster peers >> Database corruption detection & repair >> • Jan: we had a customer in Nov 2015 that screwed up their backups, and >> messed up the restore as well with merged database files. We had to write a >> tool to manually search for doc bodies. Wasn’t quite good enough for the >> client but it was a start. This is proprietary and can’t share. >> • Rnewson: from a detection perspective, we have checksums on lots of things >> but not everything. >> • Garren: do we see this a lot? >> • Gregor: even if this is 1:1000 we should really care >> • Rnewson: we should be fully conscious of disaster scenarios when we design >> our database format >> • Jan: we could have a merkle tree accompanying our btree to compare >> replication peer btree. >> • Adam: does the merkle tree thing work with our btree balancing logic? >> • All: Unclear >> • Rnewson: if you run out of disk and then clear disk space without >> restarting the server, you get into a bad place >> • Rnewson: need to define what guarantees we’re providing here >> • Rnewson: I’d be all-in on doing more here >> • Jan: need to MD5 index data in addition to documents >> • Rnewson: Recoverability in disaster involves lots of tradeoffs. We could >> do much more but the ROI is unclear >> • Davisp: With pluggable storage engines we can also allow for tradeoffs >> here for users to make tradeoffs between performance and data security. >> Wide-clusters >> • This is an attempt to get past the throughput limitations of individual >> clusters by using a routing proxy on top >> Create an exclusive namespace for databases (/_db/<dbname>) >> • Lots of discussion about what could be done to rationalize the API >> • We are all fading and need food >> >> Day 2 >> >> Improved Erlang Release Building >> • Easier configuration of variable builds for CouchDB >> • To maybe configure dreyfus >> Richer Querying Model >> • We have a separate issue for considering adding joins to Mango >> • Chainability works disastrously in Cloudant today >> • Sort-by-value is a useful construct >> • Can _changes on views help us out here? >> Systemd Handler >> • We won’t build this ourselves >> • But we will put a ticket out there and invite someone who’s passionate >> about it to take it on >> Exclusive namespace for DBs >> • Punt on it now and wait for a bigger API redesign discussion >> Single Node CouchDB >> • Not sure what our story is for single node >> • Single node was the historical way to run couchdb since forever >> • Set n=r=w=q=1 and done? >> • Set q to number_of_cores automatically >> • We accept it’s an important use case >> • 2 node cluster is the confusing case (“hot spare” not supported) >> • 1 node cluster is still chttpd/mem3/fabric, not couch_httpd_* >> • Asynchronicity of db creation/deletion still affects single node cluster >> (see consistent-dbs ticket) >> >> Externalize Erlang Term things (_revs, replication _ids) >> • Interop is impaired when we use term_to_binary >> • Instead convert to ‘canonicalised’ JSON form and hash that instead >> • Unicode normalization and floating point representation get in the way of >> general-purpose canonicalization >> • Overall goal: clearly define the procedure for generating these revisions, >> and define it in a way that makes it possible for PouchDB to reproduce >> >> PouchDB Governance >> • Open Collective / Patreon / Linux Foundation? >> • Concerned about ASF (re: jira etc) >> • Generally onboard with idea _of_ governance >> • 3-5 people on call to judge reimbursements >> • Gregor can help with the Open Collective setup >> • New “JS foundation”, more aware of JS ecosystem (are members of W3C) >> >> _all_local_docs >> • Richer API for local docs in general >> • Ability to make views on them, maybe >> • Namespacing issues (replication checkpoints doc id calculation) >> >> Telemetry db rollups >> • Good candidate for a plugin >> • IoT use case >> • Adam: for IoT there are many thing we can do and this wouldn’t have to be >> first >> >> Server-side mass update functions >> >> • Adam: better as a mango selector than as a JS function >> • Jan: what about conflicts? >> • Rnewson: value in moving it server-side. >> • Adam: Keep it async, run once on every doc existing in the DB at the time >> the task is submit >> • Rnewson: must be idempotent >> • Adam: we can guarantee that if it’s a Mango selector >> • Garren: but is that still useful? >> • Paul: sounds like a good plugin >> • Adam: I’m concerned about plugins operating at this level of the database >> • Summary: we do want to do this but it’s a difficult problem, we also want >> to avoid arbitrary Javascript function. So mango-style updater that enforces >> idempotency. >> >> Auto-record document timestamp >> • Jan: potentially creating lots of conflicts >> • Rnewson: if we do this we can choose to exclude it from the revision >> calculation. >> • Russell: This could mean that multiple documents with identical revisions >> could have different timestamps >> • Adam: that’s actually a more accurate representation of reality than what >> we do today >> • Russell: This is a trivial thing to add in the application layer >> • Gregor: I don’t know any other db that does this. It’s app-specific >> • Rnewson: there are DBs that can do it >> • Nolan: Postgres can do it >> • Rnewson: sometimes you want to use it for time ordering >> • Rnewson: possibly a rathole >> • Adam: big difference between automatic timestamp vs $update trigger >> • Conclusion: skip for now, too gnarly to consider replication scenarios >> Rebar 3 >> • Rnewson: rebar 3 has problems with deps of deps. Jiffy has dep on core >> compiler, users have their own jiffy dep, recursive deps become a problem >> • Chewbranca: no it’s that rebar 3 didn’t include the port compiler >> • Adam: reasons to move to rebar 3? >> • Paul: if we want to register our config on hex.pm. If we start using >> elixir… No pressing reason but better in the future >> • Rnewson: fewer repos will help with build problems >> Plugins >> • Jan: vision is like browser extensions, get list of things to enhance my >> Couch, download it, install it. Never made it to 2.0. Could be compile time >> rather than Fauxton drag-and-drop. How to do this nicely across existing >> cluster? >> • Adam: What do you want these plugins to be able to do? >> • Jan: anything that a core Erlang module could do >> • Jan: let’s try shipping geocouch/dreyfus as an installable plugin and see >> what happens >> • Chewbranca: how do we recommend users upgrade a Couch cluster? >> • Jan: you can upgrade one node at a time right? >> • Rnewson: yes rolling reboot >> db-per-user >> • Jan: only exists because we don’t have per-doc user permissions on a >> single database. Don’t have that because of reduce view all that guarantee >> would be out the window anyway >> • Adam: unless ddocs inherit credentials >> >> Elixir >> • Larger community than erlang >> • Various ways of thinking about this >> • Cognitive cost to having mixed erlang/elixir codebase >> • Do we allow elixir piecemeal? Only new modules? Full rewrite >> • Elixir will use erlang code/modules but won’t read it/develop it? >> • Elixir community don’t use OTP patterns >> • Elixir has extended the standard erlang library >> • Where’s the boundary of the mono repo? Does it align with erlang vs elixir >> • Need an example to prove it out >> • Elixir anywhere means all contributors need to be fluent at reading it at >> least >> • Elixir has built-in async task support, which we’ve implemented hackishly >> in erlang >> • Other communities might be interested that also use elixir >> • A lot to be said for elixir syntax >> • Virding’s complaints about elixir, doesn’t like agents, doesn’t like >> macros. Let’s go in with our eyes open. Adopt/enforce a style guide. No >> emacs/vi mode for our version of Erlang yet; we should have one for elixir >> before we start accepting PRs. >> • Rnewson: What about runtime debug / stack traces, operational concerns >> • FYI: https://www.google.com/trends/explore?q=%2Fm%2F0pl075p,%2Fm%2F02mm3 >> • https://elixirforum.com/t/benefits-of-elixir-over-erlang/253/10 >> • Russell: Pipe macro is useful >> • hot code upgrades in elixir? >> • Russell: how do you write an Elixir application inside and Erlang >> application >> • https://github.com/elixir-ecto/ecto >> >> Windows: >> • Jan: Lots of windows client downloads from our site (as many as the >> tarball!) >> • Jan: Several known customers using couchdb on windows in production - in >> clusters! >> • Jan: Couchdb 2.0 on windows is beta quality, one maintainer >> • Jan: Drives adoption >> • Adam: no interest in driving/improving from Cloudant >> • Jan: don’t think there’s interest in couchdb via Docker on windows >> • Garren: 1.6 or 2.0? >> • Jan: customers using both in production >> • Joan: 2.0 clusters on windows. Difficult to get Travis CI for this >> • Joan: reducing manual steps to test releases will help adoption >> • Joan: buffered stdio on windows is a longstanding problem >> • Joan: Windows installer signing steps is hard to automate (write a SOAP >> client diediedie) >> • Joan: what about bug reports from Windows users? If couchjs is gone, that >> could help a lot. >> • Garren: Chakra would help there >> • Joan: integrate Chakra with NIF not stdio >> • Nolan: can help with Windows and Chakra >> • Joan: Windows 7 and up >> • Joan: many bugs due to lack of 32-bit support in our Windows binary >> installer. We’re not revisiting this - not building a 32-bit version >> >> Native Mobile: >> • Jan: part of couchdb ecosystem, none of it is in couchdb or asf at all >> • Adam: we (cloudant) are directing more resources into pouchdb than mobile >> • Adam: mobile libraries are in maintenance mode >> • Nolan: more sense in investing in React Native? >> • Nolan: confusion of projects around this >> >> Couchdb as an erlang dep: >> • Russell: If I’m building an erlang application, can I pull couchdb as a >> dependency for storage? >> • Jan: not top priority, if we can enable it and have tests that we don’t >> break it, then we can do it, but embedding couchdb isn’t a goal >> • Joan: Whole lot of work for not much gain, but would help bring the Erlang >> community closer >> • Bob: yup >> >> Remove required dependency on JSON, store binary data >> • Bob: nope >> • Russell: separate metadata (_id, _rev, etc) from data, reduces processing, >> can pass data back and forth more efficiently >> • Bob: skeptical that you could get all the way to sendfile efficiency this >> way >> • Paul: don’t necessarily do fd to socket anyway >> • Bob: removing make_blocks might be an adequate improvement here instead >> • Russell: marshalling/demarshalling is a big overhead >> • Jan: we’d still present JSON, but we could save de/serialisation overhead >> >> Switch to using maps or versioned records for internode communication >> • Bob: Maps or versioned record not a complete solution >> • Russell: having no solution here is a real pain, many hacks made to work >> around passing records between nodes >> • Bob: you can see this in couch 2.0 in various ways, record field reuse, etc >> • Garren: Avro? Protobuf? >> • Adam: lessons to be learned from those, but not direct use? >> • Adam: we used distributed erlang RPC for all things, we could separate >> control and data channels to improve this. This is not rocket science, >> well-established patterns/practices round >> >> Multi-tenancy >> • Jan: desirable for couchdb >> • Adam: if MT became a first class feature of couchdb, is that tenants of >> tenants? >> • Bob: I’d like cloudant to donate the full MT code to couchdb >> • Jan: nobody can build the next wordpress on couchdb without MT >> • Adam: docker reduces the overhead of standing up full per-tenant couchdb >> instances >> • Russell: What is the vision of MT couchdb? >> • Jan: looks like full(ish) couch but you see a subset of database >> • Adam: interested in spending more time on this >> >> >> IPV6 Support: >> • Bob: should be an easy add, doesn’t work today >> >> Deprecation Wishlist >> • _show: Y >> • _list: Y >> • _update? Y >> • OS daemons: Y >> • proxy stuff: keep, for _users authn on external service >> >> • view changes: remove current half-baked code, redo with TBD discussions >> • couch_external_*: Y >> • Rewrites: kill >> • vhosts maybe: y >> • I'd really like to remove the _replicator, _users, and other special DBs >> from being actual databases that support the whole database API and instead >> have a specific API that we define (even if we use databases under the >> hood).: keep _users, hide _replicator with the scheduler rewrite. >> • Attachments :D (russell): N >> • Public fields for in _users docs: Y >> • Coffeescript: Y >> • Custom reduce functions: administratively disabled by default?: yesish >> • CORS: keep // caniuse.com/cors >> • CSP: keep >> • JSONP: drop // http://caniuse.com/#feat=cors >> • OAuth: drop, but make easy to integrate >> • Temporary views: drop >> • TLS: keep >> • Update_notification: drop >> >> >> One major feature/change per major release. Avoid python 2 / 3 paralysis. 12 >> months between major releases. Maybe 6 months >> Roadmap: >> >> 3.0 >> • Time frame >> • Features >> • MAIN: Fixing db-per-user/role / security system >> • Tombstone Eviction >> • Includes cluster purge >> • Sub doc operations >> • Maybe HTTP/2 experimental >> • Mango VDU >> • Mango Reduce >> • Deprecations >> >> 4.0 >> • Time frame >> • Features >> • http/2 >> • Deprecations >> >> 5.0 >> • Time frame >> • Features >> • Deprecations >> >> 6.0 >> • Time frame >> • Features >> • Deprecations >> >> Channel Base >> >> Some past discussion: >> • https://github.com/couchbase/sync_gateway/issues/927 >> • https://github.com/couchbase/sync_gateway/issues/264 >> • https://github.com/couchbase/couchbase-lite-ios/issues/671 >> • https://github.com/couchbase/couchbase-lite-ios/pull/776 >> >> >> “Built in” by-user-seq view, keyed by [userid/role, update seq] >> >> Docs get a new property _access: [[a,b],c] (a and b, or c) >> >> /db/_changes >> If db-admin: read by-seq >> Else: read by-user-seq with multi-key query startkey/endkey for each role >> combination (needs upper bound, say 10, complexity is O(2n - 1), gets merged >> on coordinating node, update-seq is sorting factor. Possible optimisation: >> remove duplicates (bloom filter? trie) >> >> /db/_all_docs >> Same “all-user-docs” but value is doc id, not update seq // could be >> optional, not needed for replication endpoint >> >> If a role/user is removed from a doc’s _access property, the user’s entry in >> by-user-seq gets value _removed, and well-behaving clients run purge locally >> >> If a role is removed from a user doc, we don’t send _removed’s, A client >> desiring to keep the docs can keep them, clients that don’t can purge based >> on reading roles from user doc from server. >> >> since=now can be implemented with endkey=<role>&descending=true&limit=1 on >> each by-user-seq segment >> >> Views: a user creating a ddoc can make the ddoc inherit any or all roles >> they possess (but not more), and that view is then built from the >> by-user-seq tree. >> • TBD: what to do with views that have no more members (all users with the >> defined roles (and combinations) are deleted, or get their roles revoked? >> • Keep index in case there is re-granting of access? >> • Delete index and suffer long view build into when users get roles added >> again? >> >> Questions: >> • Should we even send a remove:true (yes for doc removes) >> • Possibly track past roles in the _user db to inform client what roles have >> changed >> >> Day 3 >> >> Collision problem >> • Users could potentially get a 409 conflict for documents created by other >> users that already exist, which would be a security concern because users >> could guess existing IDs (e.g. if an ID is an email address) >> • Proposed solution is to instead send a 404 and allow users to create docs >> with the same IDs >> • Problem with that is that then you have to magically prefix all IDs with >> the user ID >> • Problem there is how you collate docs together, e.g. Gregor creates >> gregor:foo and Nolan creates nolan:foo, then they share. Do they both see >> docs with ID “foo”? Is there a rev-1 conflict? >> • Alternative is to remove magical prefixing and enforce an explicit prefix >> with a built-in VDU, e.g. that says IDs must be prefixed with “<username>:” >> // c.f. S3 buckets >> • Problem is if the fine-grained access control is a db-level config that >> can be enabled after DB creation, then the question is how the built-in VDU >> gets retroactively applied to IDs that were previously OK // maybe enabling >> this is a db-creation time feature, not a runtime feature >> • Another alternative is to not try to solve the ID-guessing problem at all >> and to instead just tell app developers that they need to enforce global >> uniqueness on their IDs at the app level, e.g. with an app VDU >> >> More open questions: >> • Avoid >> https://github.com/couchbase/sync_gateway/issues/264#issuecomment-122301576 >> • Make sure >> https://github.com/couchbase/sync_gateway/issues/927#issuecomment-115909948 >> is done right >> >> Implementation of _all_docs in fine-grained scenario >> • Need to do a parallel merge-sort of the different key spaces to avoid >> introducing duplicates >> >> As we discussed earlier, users can create their own design documents and >> delegate a subset of their roles to the document, thus allowing it to read >> all the data that they can read. We discussed the possibility of doing a >> similar kind of optimization that we’re proposing for _changes and _all_docs >> for custom design documents; i.e., allowing an administrator to publish a >> design document in the database that can read every document and selectively >> exposing subsets of the resulting index to users based on their identity. >> There are a few challenges with this approach: >> >> • In the fully-generic role-based access control (RBAC) model, a document >> could show up multiple times in the portion of the by-user-seq tree to which >> the user has access. We can address that with all_docs by smartly merging >> the rows, but we don’t know how to do that in the general case of custom >> user code >> • Aggregations that cross different role keys would have to be blocked >> >> Question: Could we implement the optimization just for Mango? >> • Joins, aggregations, etc. will be even more complicated >> • Indexing of arrays might complicate de-duplication today >> >> Definitely worth investigating post-3.0 >> >> Russell presented an alternative proposal for an API that introduced >> “virtual databases” as a way to cover the specific case of >> >> • Each user possesses exactly one role -- their ID >> • Each document has exactly one _access field: the ID of the creator >> >> Here’s the API: >> >> PUT /zab?virtual=true, user=adm >> PUT /_virtual/zab/foo1, doc, user=foo >> PUT /_virtual/zab/bar1, doc, user=bar >> GET /_virtual/zab/_all_docs, user=foo >> rows: [ >> {_id: foo1, …} >> ] >> GET /_virtual/zab/_all_docs, user=bar >> rows: [ >> {_id: bar1, …} >> ] >> GET /_virtual/zab/_changes, user=foo >> >> This idea would still require a version of the by-user-seq tree to support >> an efficient filtering of the _changes feed >> >> >> =========================================== >> >> >> Tombstone Eviction >> • Case 1: full space reclamation for deleted documents >> • Case 2: narrowing of wide revision trees after conflict resolution >> • Analyze replication checkpoints >> • Compute Seq before which peers have ack’ed all updates >> • API for introducing custom “databasement” >> • Probably don’t need rev graph if we have this >> • On compaction (or sooner), we can drop the edit branch/document entirely >> • We should use clustered purge for cleaning up Case 2 edit branches >> • Case 1 edit branches can probably skip the clustered purge path since >> • All branches are in the databasement >> • Every replica has already agreed on the replicas >> >> >> >> CouchDB Release Cadence >> • Cloudant to ask its release engineering team to help out with couchdb >> release building/testing/QA >> • Joan happy to work with Cloudant folks on CouchDB release work >> >> >> >> >>
-- Professional Support for Apache CouchDB: https://neighbourhood.ie/couchdb-support/
