+1
On Tue, May 7, 2013 at 3:52 PM, Russell Branca <chewbra...@gmail.com> wrote: > +1 > > > Very excited to see this! Great work! > > > -Russell > > > On Tue, May 7, 2013 at 1:44 PM, Robert Newson <rnew...@apache.org> wrote: > >> FYI: A zip of this work is available at >> http://people.apache.org/~rnewson/dist/nebraska-merge-candidate.zip >> made by 'git archive -o nebraska-merge-candidate.zip >> nebraska-merge-candidate' >> >> On 7 May 2013 21:34, Robert Newson <rnew...@apache.org> wrote: >> > Hi All, >> > >> > I propose to merge in the following work, >> > https://github.com/rnewson/couchdb/tree/nebraska-merge-candidate to >> > the official Apache CouchDB repository to a new branch (i.e, *not* >> > master). Once there, the full CouchDB developer community can begin >> > the work to incorporate the code here into an official release. >> > >> > You do not need to respond if you are in agreement. If there is no >> > response in 72 hours, I will assume lazy consensus. If we reach >> > consensus, I will start the IP clearance process and then the merge. >> > >> > As most of you know, Paul Davis and I recently sequestered ourselves >> > away from society (in a place called Nebraska) to make this merge >> > happen. I want to clarify that this work is not the BigCouch code you >> > can see on github.com/cloudant/bigcouch but the Cloudant platform from >> > which BigCouch was made. This means it is bang up to date with all the >> > bug fixes and feature enhancements we've made in the last eighteen >> > months or more. With that clarification made, here are our notes about >> > what we achieved, what it means to the project and what isn't yet >> > done; >> > >> > Nebraska Merge Roundup >> > >> > >> > Stats: >> > >> > >> > 1402 - total new commits >> > >> > 312 - commits written during the merge (will be reduced substantially >> > by squashing) >> > >> > 408 - number of files changed >> > >> > 21,897 - number of lines added >> > >> > 4,277 - number of lines removed >> > >> > A retrospective: >> > >> > Bob Newson and I have come to the end of our merge sprint on getting >> > BigCouch merged into Apache CouchDB. Its been a productive ten days >> > here in the midwest. I managed to get Bob out to a bowling alley and >> > he managed to get me to a sushi restaurant. In between the cultural >> > exchanges we’ve also managed to get a significant amount of work done >> > on the merging as well. >> > >> > >> > The current status of the merge is that we’ve managed to resolve the >> > differences in the single node execution of CouchDB. Both the >> > JavaScript and Erlang test suites run with only one failure in the >> > Erlang test suite due to a (deliberately) missing constraint on the >> > number of operating system processes. This should be a relatively >> > straightforward fix but was not prioritized during our limited time to >> > work on the larger issues. >> > >> > >> > We merged a large number of performance and stability enhancements >> > back into single node CouchDB as well as a number of pure bug fixes. >> > The biggest highlight is a brand new compactor that is both faster and >> > creates smaller and better organized post-compaction databases. >> > >> > >> > The current status of the merge is that single node operations should >> > be completely unaffected as demonstrated by the test suite passing. On >> > the other hand we haven’t yet finished getting the clustered code >> > merged to use some of the new changes in single node CouchDB. The >> > single most significant portion of this work involves updates to the >> > internal cluster API for views to use the recently rewritten indexer >> > APIs. This should be a relatively straightforward bit of work that >> > we’ll be finishing over the next few weeks. >> > >> > >> > All in all the merge work done so far has been quite successful. We’ve >> > met our primary goal of getting the code merged in a fashion that does >> > not affect single node operation while providing a starting point for >> > the larger community to start reviewing the more significant changes >> > made. Given the size of the diff between the two code bases we never >> > expected to have a fully working clustered solution after ten days of >> > work but we have succeeded in providing a base of work that will allow >> > us and new contributors to get up to speed quickly. >> > >> > >> > This work, coupled with work by Dave Cottlehuber and Benoît Chesneau >> > on updating the build system and various other internal updates, will >> > provide a solid foundation for work going forward. Its an exciting >> > time for CouchDB and anyone interested should keep an eye on the next >> > few releases as we ramp up work on various core aspects of the >> > database. >> > >> > >> > We’ve had an exciting few days working to prepare the road for an >> > exciting next twelve to eighteen months. We hope that everyone will >> > feel as excited as we do about the next twelve to eighteen months for >> > Apache CouchDB. It should be an exciting ride. >> > >> > >> > >> > Things we got done >> > >> > >> > * Large update to the source tree layout for Erlang applications. Each >> > application now has a src/appname/(c_src|ebin|priv|src) structure. The >> > build system has been updated. >> > >> > * Renamed src/couchdb to src/couch to match the Erlang convention of >> > the top directory name matching the Erlang application name. >> > >> > * Imported Cloudant Erlang applications for clustered CouchDB. These >> > are imported with their history by using git subtree and merging the >> > top level commit. These are not external deps, development will happen >> > within the CouchDB tree. The imported apps are: >> > >> > >> > * config - A couch_config replacement (Behavior is mostly identical >> > to couch_config except how we listen for configuration changes >> > internally to allow for smooth hot code upgrade). >> > >> > * twig - An rsyslog source replacement for couch_log. >> > >> > * rexi - An RPC library. Replaces Erlang’s built-in rex application >> > to avoid costly safety measures in the interest of performance and >> > throughput. >> > >> > * mem3 - The “Dynamo” part of BigCouch responsible for managing >> cluster state >> > >> > * fabric - The internal cluster-aware CouachDB API >> > >> > * ets_lru - A small library application that provides an LRU >> > implementation using a couple ets tables. >> > >> > * ddoc_cache - Caches design documents on each node for use in >> > design handler functions. This uses an ets_lru cache with a very short >> > TTL. >> > >> > * chttpd - The cluster aware HTTP layer >> > >> > >> > Each imported app also had its build system updated to use Autotools >> > along with the necessary updates noted above for the new application >> > layouts for existing CouchDB erlang apps. >> > >> > >> > * Merged a large amount of updates and fixes to couch_replicator based >> > on work done internally at Cloudant. Unfortunately due to an error >> > when we created our internal clone we lost a bit of history in some of >> > the initial merge and have a big commit that affects >> > couch_replicator_manager mostly. There are a number of other commits >> > related to couch_replicator that resolve the single node vs. clustered >> > differences. Some noticeable couch_replicator features: >> > >> > >> > * Optionally disable checkpoints so that replication can work when >> > a source is read only. This should only be used for smaller databases >> > as each replication call has to scan the entire source database on >> > each invocation. >> > >> > * A new changes_pending field in the _active_tasks output >> > >> > * A fix to the continuous replication to automatically reconnect to >> > a continuous changes feed when it sees a last_seq value. This allows >> > for the source to selectively recycle the HTTP connections used which >> > can be quite useful for “permanent” replications. >> > >> > * A multitude of smaller bug fix and stability enhancements. >> > >> > >> > Updates to single node couch: >> > >> > >> > * We changed the by_seq tree to store a copy of the #full_doc_info{} >> > record instead of the #doc_info{} record. This gives significant speed >> > improvements for compaction and replication and generally anything >> > that needs to walk the by_seq tree and access document bodies >> > internally. >> > >> > * We rewrote the compactor to be significantly faster as well as >> > provides significantly better compacted databases. The two main halves >> > are to use a temp file and replace the use of btrees in the temp file. >> > The temp file only contains a temporary copy of the document ids. At >> > the end of a compaction run we then rebuild the by_id btree in the >> > compaction file from this temp file. The reason this helps so much is >> > that the compaction is based on the update_seq btree, which for most >> > cases means that the id tree is updated in roughly random order which >> > is very bad for our append only btrees. By using the tmp file we can >> > stream it in order back into the compacted db file at the end of >> > compacting, generating a minimum amount of garbage in the process. The >> > other upgrade was to implement an external merge sort module >> > (couch_emsort) that is used with this temporary file. >> > >> > * Reject updates to design docs that introduce updates that break >> > compilation for source code. Currently we only check map and reduce >> > calls as the other should provide user visible errors instead of >> > inexplicably empty views. >> > >> > because my OCD kicked in and I was unable to resist. >> > >> > * Reverted a change made a long time ago that uses two file >> > descriptors for each database. See the todo list. >> > >> > * The reason to remove the second fd is so that we can rewrite ref >> > counting. Better ref counting makes everyone happy, but the real >> > reason is for this next bullet point: >> > >> > * Optimize couch_server to not require a round trip message pass for >> > opening a database that’s in the LRU. This is a significant >> > performance boost for high concurrency access. We also optimized >> > couch_server internals to not blow up when it’s under load. >> > >> > * Introduce a #leaf{} record into the revision trees. This is never >> > written to disk but makes internal code a lot cleaner when dealing >> > with multiple versions of rev tree values. >> > >> > * Some changes to couch_changes to enable clustered access. Also some >> > general cleanup >> > >> > * Internal changes to how CouchDB is booted in Erlang land. Not very >> > sexy but this removes a lot of complicated un-Erlangy bits. We still >> > have a bit of work left here. >> > >> > * btree chunk sizes are now configurable which can allow people to >> > adjust the RAM/speed tradeoffs a bit more. >> > >> > * We now load update validation functions on the first write. This is >> > a cluster-motivated change because the clustered version of this call >> > is expensive and can lead to race conditions when opening a bunch of >> > db shards simultaneously. This should be invisible to external >> > clients. >> > >> > * Disabled conflict detection for local docs. They don’t replicate so >> > there’s no point. This just led to clusters getting stuck and confused >> > when there were lots of replications happening. >> > >> > * Changes to the multipart/mime parsing code. Necessary for clustered >> > attachment uploads to split the incoming data stream into N copies. >> > >> > * Don’t use init:restart/0 when reloading the ICU driver. I think >> > this has a bug. But we should rewrite this driver to be a NIF anyway. >> > >> > * New couch OS process manager. Significantly faster access to OS >> > processes under heavy load. This replaces the hard limit with a soft >> > limit. Process spawned over the soft limit will be used until they’ve >> > sat idle for a few minutes and then be closed. We have a todo item to >> > add the hard ceiling back in (while keeping the soft ceiling). >> > >> > * Automatically replace some easily identifiable JS reductions with >> > their builtin counterparts. Uses a regex to do the detection so its >> > not too smart. >> > >> > * Improved view updater write batch. >> > >> > * Updates to couchjs’ views.js to improve index update speeds >> > >> > * Updates to the _stats bultin reduce to allow reduces to work over >> > emitted stats objects. Sometimes clients have summary data in a doc, >> > and this allows them to combine stats if they follow the same pattern >> > as the builtin expects. >> > >> > * Added a config:reload() that is accessible by POST’ing to >> > _config/_reload. Used by the JS tests to reset the config to what's on >> > disk. This should prevent those test run failures where a test fails >> > leaving the config in a bad state causing all subsequent tests to >> > fail. I think. Maybe. >> > >> > * Databases are deleted synchronously in the test suite. We may need >> > to address this on Windows. But it does seem to reduce the number of >> > “{error, file_exists}” failures. >> > >> > * I reimplemented the JS restartServer() function. There’s a new >> > _restart/token URL that will given a unique value for each instance of >> > the Erlang VM. To run a restart we grab the current token value, hit >> > _restart, then wait till we get a successful response with a different >> > token. This appears to have made the restart strategy more robust. >> > >> > >> > >> > Things that need doing >> > >> > >> > IP Clearance - >> > >> > >> > We’ll need to track down if we have the CCLA as well as look at each >> > source file added to make sure each one is strictly from Cloudant or >> > has an amenable license. I’m pretty sure that the only one of interest >> > is trunc_io.erl but we need to be thorough. >> > >> > documentation - >> > >> > >> > There shouldn’t be much here since the entire point of this merge was >> > to not change the visible behavior of single node couch. A few things >> > to add about the testing endpoints. Maybe an update to the compaction >> > section mention the two new file names used. >> > >> > >> > Copyright notices - >> > >> > >> > We need to strip out copyright notices from individual files and make >> > sure all files have a standard Apache License v2 header. >> > >> > >> > clustered vhosts - >> > >> > >> > We’ve never implemented this at Cloudant. We either need to write a >> > cluster or go back and tell people to use HAProxy (or similar) for >> > such things. >> > >> > >> > twig - >> > >> > >> > We need to add another output type to twig that is configurable in >> > some manner. Right now we spit out entire rsyslog records which isn’t >> > useful for most people. We’ll need to implement the file writer from >> > couch_log as well as update the _log HTTP handler to know when it can >> > and can’t expect to find data on disk. >> > >> > >> > fabric - >> > >> > >> > This is going to need a lot of work. Specifically view access is going >> > to need to be updated to work with couch_mrview and friends. >> > >> > >> > Boot a dev cluster - >> > >> > >> > Once we fix up the clustering code we’ll need to write instructions >> > and scripts for pulling up a dev cluster. >> > >> > >> > OTP stuff - >> > >> > >> > We’ve updated each app but we still need to pull some parts out of >> > couchdb into their own application. Specifically the HTTP layer needs >> > its own app. We could probably pull out the os process/query_servers >> > as well as the os daemons and friends. Once done we need to update the >> > supervision trees so we don’t have things like couch starting and >> > managing the replication manager process. >> > >> > >> > ddoc_cache - >> > >> > >> > Wire this up in couch_httpd_db to actually be used. Right now its only >> > used in chttpd. >> > >> > >> > couch_file upgrade - >> > >> > >> > The revert to remove the second updater_fd from each #db{} record >> > means that we’re back in the original position of files appearing to >> > slow down significantly under load. Since the initial hammer approach >> > of just adding a second fd we’ve since discovered that the underlying >> > bug is due to the way that message passing works combined with >> > Erlang’s file io. Significantly though is the fact that the fix is >> > rather simple to implement. A first draft of this work is on an old >> > branch of mine here: >> > >> > >> > https://github.com/davisp/couchdb/commit/d856878 >> > >> > >> > finish the size calculating changes - >> > >> > >> > The #leaf{} record change is to enable us to add more data size >> > calculations. CouchDB master calculates a data size that account for >> > all bytes that are active in a .couch file. Cloudant is interested in >> > the total size of uncompressed docs and attachments minus the internal >> > overhead of btrees. And there’s a fourth number to calculate based on >> > the compression level used. Having each of these numbers will be >> > useful as well as the calculations they’ll enable (ie, dead bytes in >> > file, bytes used for overhead, compression ratio achieved, etc). >> > >> > >> > couch_proc_manager - >> > >> > >> > We need to implement the hard ceiling for capping the number of OS >> > processes. We’ve started seeing a need for this at Cloudant with some >> > work loads so motivation to fix this is high. The only failing etap is >> > the assertion of this ceiling. >> > >> > >> > Synchronous db delete on Windows - >> > >> > >> > I did this because running the test suite was driving me bonkers. I >> > need to ask Dave about how this behaves on Windows (my guess is not >> > well) but I think we can close things up so that it works better than >> > the status quo. >>