git help svn
On 16 May 2013 13:13, Robert Newson <[email protected]> wrote: > Righto. Now to remember how subversion works... > > On 15 May 2013 17:09, Noah Slater <[email protected]> wrote: > > Okay. > > > > Start here: > > > > http://incubator.apache.org/ip-clearance/ > > > > Then make a copy of this file: > > > > > http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/ip-clearance-template.xml > > > > This file, when rendered to HTML will look like: > > > > http://incubator.apache.org/ip-clearance/ip-clearance-template.html > > > > In your local copy, cut everything from: > > > > <pre>-----8-<---- cut here -------8-<---- cut here > > -------8-<---- cut here-------8-<----</pre> > > > > To: > > > > <pre>-----8-<---- cut here -------8-<---- cut here > > -------8-<---- cut here-------8-<----</pre> > > > > Now, add your copy back to Subversion here: > > > > > http://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/ > > > > Call it "couchdb-bigcouch.xml". > > > > In a few minutes, this will appear here: > > > > http://incubator.apache.org/ip-clearance/couchdb-bigcouch.html > > > > Now, it should be a simple matter of going through the doc and completing > > the checkpoints/sections. > > > > Here are the two previous ones we've done: > > > > http://incubator.apache.org/ip-clearance/couchdb-docs.html > > > > http://incubator.apache.org/ip-clearance/couchdb-fauxton.html > > > > Let me know if you get stuck on any of the checkpoints. > > > > Once you're done, let me know, and I will use my member karma to push it > > through the Incubator. > > > > Benoit, you may as well start your rcouch stuff at the same time using > this > > instructions. Obviously, you should pick "couchdb-rcouch.xml" instead. > But > > other than that, it's the same process. > > > > On 15 May 2013 16:24, Noah Slater <[email protected]> wrote: > > > >> I can help! :) > >> > >> > >> On 15 May 2013 16:23, Robert Newson <[email protected]> wrote: > >> > >>> :) > >>> > >>> Jan, I think you said you'd help start the IP clearance bit? > >>> > >>> On 15 May 2013 15:03, Noah Slater <[email protected]> wrote: > >>> > PARTY TIME 🎉 > >>> > > >>> > > >>> > On 15 May 2013 10:40, Robert Newson <[email protected]> wrote: > >>> > > >>> >> Thanks everyone. > >>> >> > >>> >> The tally is; > >>> >> > >>> >> 13 +1's > >>> >> > >>> >> The vote passes. We'll now move on to IP clearance. Once that's done > >>> >> the work will arrive on a feature branch in our main git repository. > >>> >> > >>> >> B. > >>> >> > >>> >> > >>> >> On 13 May 2013 04:31, Jason Smith <[email protected]> wrote: > >>> >> > Sorry, just catching up. > >>> >> > > >>> >> > +1 > >>> >> > > >>> >> > On Fri, May 10, 2013 at 4:29 PM, Jan Lehnardt <[email protected]> > >>> wrote: > >>> >> >> +1 > >>> >> >> > >>> >> >> Jan > >>> >> >> -- > >>> >> >> > >>> >> >> On May 7, 2013, at 21:34 , Robert Newson <[email protected]> > >>> wrote: > >>> >> >> > >>> >> >>> Hi All, > >>> >> >>> > >>> >> >>> I propose to merge in the following work, > >>> >> >>> > https://github.com/rnewson/couchdb/tree/nebraska-merge-candidateto > >>> >> >>> the official Apache CouchDB repository to a new branch (i.e, > *not* > >>> >> >>> master). Once there, the full CouchDB developer community can > begin > >>> >> >>> the work to incorporate the code here into an official release. > >>> >> >>> > >>> >> >>> You do not need to respond if you are in agreement. If there is > no > >>> >> >>> response in 72 hours, I will assume lazy consensus. If we reach > >>> >> >>> consensus, I will start the IP clearance process and then the > >>> merge. > >>> >> >>> > >>> >> >>> As most of you know, Paul Davis and I recently sequestered > >>> ourselves > >>> >> >>> away from society (in a place called Nebraska) to make this > merge > >>> >> >>> happen. I want to clarify that this work is not the BigCouch > code > >>> you > >>> >> >>> can see on github.com/cloudant/bigcouch but the Cloudant > platform > >>> from > >>> >> >>> which BigCouch was made. This means it is bang up to date with > all > >>> the > >>> >> >>> bug fixes and feature enhancements we've made in the last > eighteen > >>> >> >>> months or more. With that clarification made, here are our notes > >>> about > >>> >> >>> what we achieved, what it means to the project and what isn't > yet > >>> >> >>> done; > >>> >> >>> > >>> >> >>> Nebraska Merge Roundup > >>> >> >>> > >>> >> >>> > >>> >> >>> Stats: > >>> >> >>> > >>> >> >>> > >>> >> >>> 1402 - total new commits > >>> >> >>> > >>> >> >>> 312 - commits written during the merge (will be reduced > >>> substantially > >>> >> >>> by squashing) > >>> >> >>> > >>> >> >>> 408 - number of files changed > >>> >> >>> > >>> >> >>> 21,897 - number of lines added > >>> >> >>> > >>> >> >>> 4,277 - number of lines removed > >>> >> >>> > >>> >> >>> A retrospective: > >>> >> >>> > >>> >> >>> Bob Newson and I have come to the end of our merge sprint on > >>> getting > >>> >> >>> BigCouch merged into Apache CouchDB. Its been a productive ten > days > >>> >> >>> here in the midwest. I managed to get Bob out to a bowling alley > >>> and > >>> >> >>> he managed to get me to a sushi restaurant. In between the > cultural > >>> >> >>> exchanges we’ve also managed to get a significant amount of work > >>> done > >>> >> >>> on the merging as well. > >>> >> >>> > >>> >> >>> > >>> >> >>> The current status of the merge is that we’ve managed to resolve > >>> the > >>> >> >>> differences in the single node execution of CouchDB. Both the > >>> >> >>> JavaScript and Erlang test suites run with only one failure in > the > >>> >> >>> Erlang test suite due to a (deliberately) missing constraint on > the > >>> >> >>> number of operating system processes. This should be a > relatively > >>> >> >>> straightforward fix but was not prioritized during our limited > >>> time to > >>> >> >>> work on the larger issues. > >>> >> >>> > >>> >> >>> > >>> >> >>> We merged a large number of performance and stability > enhancements > >>> >> >>> back into single node CouchDB as well as a number of pure bug > >>> fixes. > >>> >> >>> The biggest highlight is a brand new compactor that is both > faster > >>> and > >>> >> >>> creates smaller and better organized post-compaction databases. > >>> >> >>> > >>> >> >>> > >>> >> >>> The current status of the merge is that single node operations > >>> should > >>> >> >>> be completely unaffected as demonstrated by the test suite > >>> passing. On > >>> >> >>> the other hand we haven’t yet finished getting the clustered > code > >>> >> >>> merged to use some of the new changes in single node CouchDB. > The > >>> >> >>> single most significant portion of this work involves updates to > >>> the > >>> >> >>> internal cluster API for views to use the recently rewritten > >>> indexer > >>> >> >>> APIs. This should be a relatively straightforward bit of work > that > >>> >> >>> we’ll be finishing over the next few weeks. > >>> >> >>> > >>> >> >>> > >>> >> >>> All in all the merge work done so far has been quite successful. > >>> We’ve > >>> >> >>> met our primary goal of getting the code merged in a fashion > that > >>> does > >>> >> >>> not affect single node operation while providing a starting > point > >>> for > >>> >> >>> the larger community to start reviewing the more significant > >>> changes > >>> >> >>> made. Given the size of the diff between the two code bases we > >>> never > >>> >> >>> expected to have a fully working clustered solution after ten > days > >>> of > >>> >> >>> work but we have succeeded in providing a base of work that will > >>> allow > >>> >> >>> us and new contributors to get up to speed quickly. > >>> >> >>> > >>> >> >>> > >>> >> >>> This work, coupled with work by Dave Cottlehuber and Benoît > >>> Chesneau > >>> >> >>> on updating the build system and various other internal updates, > >>> will > >>> >> >>> provide a solid foundation for work going forward. Its an > exciting > >>> >> >>> time for CouchDB and anyone interested should keep an eye on the > >>> next > >>> >> >>> few releases as we ramp up work on various core aspects of the > >>> >> >>> database. > >>> >> >>> > >>> >> >>> > >>> >> >>> We’ve had an exciting few days working to prepare the road for > an > >>> >> >>> exciting next twelve to eighteen months. We hope that everyone > will > >>> >> >>> feel as excited as we do about the next twelve to eighteen > months > >>> for > >>> >> >>> Apache CouchDB. It should be an exciting ride. > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> Things we got done > >>> >> >>> > >>> >> >>> > >>> >> >>> * Large update to the source tree layout for Erlang > applications. > >>> Each > >>> >> >>> application now has a src/appname/(c_src|ebin|priv|src) > structure. > >>> The > >>> >> >>> build system has been updated. > >>> >> >>> > >>> >> >>> * Renamed src/couchdb to src/couch to match the Erlang > convention > >>> of > >>> >> >>> the top directory name matching the Erlang application name. > >>> >> >>> > >>> >> >>> * Imported Cloudant Erlang applications for clustered CouchDB. > >>> These > >>> >> >>> are imported with their history by using git subtree and merging > >>> the > >>> >> >>> top level commit. These are not external deps, development will > >>> happen > >>> >> >>> within the CouchDB tree. The imported apps are: > >>> >> >>> > >>> >> >>> > >>> >> >>> * config - A couch_config replacement (Behavior is mostly > >>> identical > >>> >> >>> to couch_config except how we listen for configuration changes > >>> >> >>> internally to allow for smooth hot code upgrade). > >>> >> >>> > >>> >> >>> * twig - An rsyslog source replacement for couch_log. > >>> >> >>> > >>> >> >>> * rexi - An RPC library. Replaces Erlang’s built-in rex > >>> application > >>> >> >>> to avoid costly safety measures in the interest of performance > and > >>> >> >>> throughput. > >>> >> >>> > >>> >> >>> * mem3 - The “Dynamo” part of BigCouch responsible for > managing > >>> >> cluster state > >>> >> >>> > >>> >> >>> * fabric - The internal cluster-aware CouachDB API > >>> >> >>> > >>> >> >>> * ets_lru - A small library application that provides an LRU > >>> >> >>> implementation using a couple ets tables. > >>> >> >>> > >>> >> >>> * ddoc_cache - Caches design documents on each node for use in > >>> >> >>> design handler functions. This uses an ets_lru cache with a very > >>> short > >>> >> >>> TTL. > >>> >> >>> > >>> >> >>> * chttpd - The cluster aware HTTP layer > >>> >> >>> > >>> >> >>> > >>> >> >>> Each imported app also had its build system updated to use > >>> Autotools > >>> >> >>> along with the necessary updates noted above for the new > >>> application > >>> >> >>> layouts for existing CouchDB erlang apps. > >>> >> >>> > >>> >> >>> > >>> >> >>> * Merged a large amount of updates and fixes to couch_replicator > >>> based > >>> >> >>> on work done internally at Cloudant. Unfortunately due to an > error > >>> >> >>> when we created our internal clone we lost a bit of history in > >>> some of > >>> >> >>> the initial merge and have a big commit that affects > >>> >> >>> couch_replicator_manager mostly. There are a number of other > >>> commits > >>> >> >>> related to couch_replicator that resolve the single node vs. > >>> clustered > >>> >> >>> differences. Some noticeable couch_replicator features: > >>> >> >>> > >>> >> >>> > >>> >> >>> * Optionally disable checkpoints so that replication can work > >>> when > >>> >> >>> a source is read only. This should only be used for smaller > >>> databases > >>> >> >>> as each replication call has to scan the entire source database > on > >>> >> >>> each invocation. > >>> >> >>> > >>> >> >>> * A new changes_pending field in the _active_tasks output > >>> >> >>> > >>> >> >>> * A fix to the continuous replication to automatically > reconnect > >>> to > >>> >> >>> a continuous changes feed when it sees a last_seq value. This > >>> allows > >>> >> >>> for the source to selectively recycle the HTTP connections used > >>> which > >>> >> >>> can be quite useful for “permanent” replications. > >>> >> >>> > >>> >> >>> * A multitude of smaller bug fix and stability enhancements. > >>> >> >>> > >>> >> >>> > >>> >> >>> Updates to single node couch: > >>> >> >>> > >>> >> >>> > >>> >> >>> * We changed the by_seq tree to store a copy of the > >>> #full_doc_info{} > >>> >> >>> record instead of the #doc_info{} record. This gives significant > >>> speed > >>> >> >>> improvements for compaction and replication and generally > anything > >>> >> >>> that needs to walk the by_seq tree and access document bodies > >>> >> >>> internally. > >>> >> >>> > >>> >> >>> * We rewrote the compactor to be significantly faster as well as > >>> >> >>> provides significantly better compacted databases. The two main > >>> halves > >>> >> >>> are to use a temp file and replace the use of btrees in the temp > >>> file. > >>> >> >>> The temp file only contains a temporary copy of the document > ids. > >>> At > >>> >> >>> the end of a compaction run we then rebuild the by_id btree in > the > >>> >> >>> compaction file from this temp file. The reason this helps so > much > >>> is > >>> >> >>> that the compaction is based on the update_seq btree, which for > >>> most > >>> >> >>> cases means that the id tree is updated in roughly random order > >>> which > >>> >> >>> is very bad for our append only btrees. By using the tmp file we > >>> can > >>> >> >>> stream it in order back into the compacted db file at the end of > >>> >> >>> compacting, generating a minimum amount of garbage in the > process. > >>> The > >>> >> >>> other upgrade was to implement an external merge sort module > >>> >> >>> (couch_emsort) that is used with this temporary file. > >>> >> >>> > >>> >> >>> * Reject updates to design docs that introduce updates that > break > >>> >> >>> compilation for source code. Currently we only check map and > reduce > >>> >> >>> calls as the other should provide user visible errors instead of > >>> >> >>> inexplicably empty views. > >>> >> >>> > >>> >> >>> because my OCD kicked in and I was unable to resist. > >>> >> >>> > >>> >> >>> * Reverted a change made a long time ago that uses two file > >>> >> >>> descriptors for each database. See the todo list. > >>> >> >>> > >>> >> >>> * The reason to remove the second fd is so that we can rewrite > ref > >>> >> >>> counting. Better ref counting makes everyone happy, but the real > >>> >> >>> reason is for this next bullet point: > >>> >> >>> > >>> >> >>> * Optimize couch_server to not require a round trip message pass > >>> for > >>> >> >>> opening a database that’s in the LRU. This is a significant > >>> >> >>> performance boost for high concurrency access. We also optimized > >>> >> >>> couch_server internals to not blow up when it’s under load. > >>> >> >>> > >>> >> >>> * Introduce a #leaf{} record into the revision trees. This is > never > >>> >> >>> written to disk but makes internal code a lot cleaner when > dealing > >>> >> >>> with multiple versions of rev tree values. > >>> >> >>> > >>> >> >>> * Some changes to couch_changes to enable clustered access. Also > >>> some > >>> >> >>> general cleanup > >>> >> >>> > >>> >> >>> * Internal changes to how CouchDB is booted in Erlang land. Not > >>> very > >>> >> >>> sexy but this removes a lot of complicated un-Erlangy bits. We > >>> still > >>> >> >>> have a bit of work left here. > >>> >> >>> > >>> >> >>> * btree chunk sizes are now configurable which can allow people > to > >>> >> >>> adjust the RAM/speed tradeoffs a bit more. > >>> >> >>> > >>> >> >>> * We now load update validation functions on the first write. > This > >>> is > >>> >> >>> a cluster-motivated change because the clustered version of this > >>> call > >>> >> >>> is expensive and can lead to race conditions when opening a > bunch > >>> of > >>> >> >>> db shards simultaneously. This should be invisible to external > >>> >> >>> clients. > >>> >> >>> > >>> >> >>> * Disabled conflict detection for local docs. They don’t > replicate > >>> so > >>> >> >>> there’s no point. This just led to clusters getting stuck and > >>> confused > >>> >> >>> when there were lots of replications happening. > >>> >> >>> > >>> >> >>> * Changes to the multipart/mime parsing code. Necessary for > >>> clustered > >>> >> >>> attachment uploads to split the incoming data stream into N > >>> copies. > >>> >> >>> > >>> >> >>> * Don’t use init:restart/0 when reloading the ICU driver. I > think > >>> >> >>> this has a bug. But we should rewrite this driver to be a NIF > >>> anyway. > >>> >> >>> > >>> >> >>> * New couch OS process manager. Significantly faster access to > OS > >>> >> >>> processes under heavy load. This replaces the hard limit with a > >>> soft > >>> >> >>> limit. Process spawned over the soft limit will be used until > >>> they’ve > >>> >> >>> sat idle for a few minutes and then be closed. We have a todo > item > >>> to > >>> >> >>> add the hard ceiling back in (while keeping the soft ceiling). > >>> >> >>> > >>> >> >>> * Automatically replace some easily identifiable JS reductions > with > >>> >> >>> their builtin counterparts. Uses a regex to do the detection so > its > >>> >> >>> not too smart. > >>> >> >>> > >>> >> >>> * Improved view updater write batch. > >>> >> >>> > >>> >> >>> * Updates to couchjs’ views.js to improve index update speeds > >>> >> >>> > >>> >> >>> * Updates to the _stats bultin reduce to allow reduces to work > over > >>> >> >>> emitted stats objects. Sometimes clients have summary data in a > >>> doc, > >>> >> >>> and this allows them to combine stats if they follow the same > >>> pattern > >>> >> >>> as the builtin expects. > >>> >> >>> > >>> >> >>> * Added a config:reload() that is accessible by POST’ing to > >>> >> >>> _config/_reload. Used by the JS tests to reset the config to > >>> what's on > >>> >> >>> disk. This should prevent those test run failures where a test > >>> fails > >>> >> >>> leaving the config in a bad state causing all subsequent tests > to > >>> >> >>> fail. I think. Maybe. > >>> >> >>> > >>> >> >>> * Databases are deleted synchronously in the test suite. We may > >>> need > >>> >> >>> to address this on Windows. But it does seem to reduce the > number > >>> of > >>> >> >>> “{error, file_exists}” failures. > >>> >> >>> > >>> >> >>> * I reimplemented the JS restartServer() function. There’s a new > >>> >> >>> _restart/token URL that will given a unique value for each > >>> instance of > >>> >> >>> the Erlang VM. To run a restart we grab the current token value, > >>> hit > >>> >> >>> _restart, then wait till we get a successful response with a > >>> different > >>> >> >>> token. This appears to have made the restart strategy more > robust. > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> Things that need doing > >>> >> >>> > >>> >> >>> > >>> >> >>> IP Clearance - > >>> >> >>> > >>> >> >>> > >>> >> >>> We’ll need to track down if we have the CCLA as well as look at > >>> each > >>> >> >>> source file added to make sure each one is strictly from > Cloudant > >>> or > >>> >> >>> has an amenable license. I’m pretty sure that the only one of > >>> interest > >>> >> >>> is trunc_io.erl but we need to be thorough. > >>> >> >>> > >>> >> >>> documentation - > >>> >> >>> > >>> >> >>> > >>> >> >>> There shouldn’t be much here since the entire point of this > merge > >>> was > >>> >> >>> to not change the visible behavior of single node couch. A few > >>> things > >>> >> >>> to add about the testing endpoints. Maybe an update to the > >>> compaction > >>> >> >>> section mention the two new file names used. > >>> >> >>> > >>> >> >>> > >>> >> >>> Copyright notices - > >>> >> >>> > >>> >> >>> > >>> >> >>> We need to strip out copyright notices from individual files and > >>> make > >>> >> >>> sure all files have a standard Apache License v2 header. > >>> >> >>> > >>> >> >>> > >>> >> >>> clustered vhosts - > >>> >> >>> > >>> >> >>> > >>> >> >>> We’ve never implemented this at Cloudant. We either need to > write a > >>> >> >>> cluster or go back and tell people to use HAProxy (or similar) > for > >>> >> >>> such things. > >>> >> >>> > >>> >> >>> > >>> >> >>> twig - > >>> >> >>> > >>> >> >>> > >>> >> >>> We need to add another output type to twig that is configurable > in > >>> >> >>> some manner. Right now we spit out entire rsyslog records which > >>> isn’t > >>> >> >>> useful for most people. We’ll need to implement the file writer > >>> from > >>> >> >>> couch_log as well as update the _log HTTP handler to know when > it > >>> can > >>> >> >>> and can’t expect to find data on disk. > >>> >> >>> > >>> >> >>> > >>> >> >>> fabric - > >>> >> >>> > >>> >> >>> > >>> >> >>> This is going to need a lot of work. Specifically view access is > >>> going > >>> >> >>> to need to be updated to work with couch_mrview and friends. > >>> >> >>> > >>> >> >>> > >>> >> >>> Boot a dev cluster - > >>> >> >>> > >>> >> >>> > >>> >> >>> Once we fix up the clustering code we’ll need to write > instructions > >>> >> >>> and scripts for pulling up a dev cluster. > >>> >> >>> > >>> >> >>> > >>> >> >>> OTP stuff - > >>> >> >>> > >>> >> >>> > >>> >> >>> We’ve updated each app but we still need to pull some parts out > of > >>> >> >>> couchdb into their own application. Specifically the HTTP layer > >>> needs > >>> >> >>> its own app. We could probably pull out the os > >>> process/query_servers > >>> >> >>> as well as the os daemons and friends. Once done we need to > update > >>> the > >>> >> >>> supervision trees so we don’t have things like couch starting > and > >>> >> >>> managing the replication manager process. > >>> >> >>> > >>> >> >>> > >>> >> >>> ddoc_cache - > >>> >> >>> > >>> >> >>> > >>> >> >>> Wire this up in couch_httpd_db to actually be used. Right now > its > >>> only > >>> >> >>> used in chttpd. > >>> >> >>> > >>> >> >>> > >>> >> >>> couch_file upgrade - > >>> >> >>> > >>> >> >>> > >>> >> >>> The revert to remove the second updater_fd from each #db{} > record > >>> >> >>> means that we’re back in the original position of files > appearing > >>> to > >>> >> >>> slow down significantly under load. Since the initial hammer > >>> approach > >>> >> >>> of just adding a second fd we’ve since discovered that the > >>> underlying > >>> >> >>> bug is due to the way that message passing works combined with > >>> >> >>> Erlang’s file io. Significantly though is the fact that the fix > is > >>> >> >>> rather simple to implement. A first draft of this work is on an > old > >>> >> >>> branch of mine here: > >>> >> >>> > >>> >> >>> > >>> >> >>> https://github.com/davisp/couchdb/commit/d856878 > >>> >> >>> > >>> >> >>> > >>> >> >>> finish the size calculating changes - > >>> >> >>> > >>> >> >>> > >>> >> >>> The #leaf{} record change is to enable us to add more data size > >>> >> >>> calculations. CouchDB master calculates a data size that account > >>> for > >>> >> >>> all bytes that are active in a .couch file. Cloudant is > interested > >>> in > >>> >> >>> the total size of uncompressed docs and attachments minus the > >>> internal > >>> >> >>> overhead of btrees. And there’s a fourth number to calculate > based > >>> on > >>> >> >>> the compression level used. Having each of these numbers will be > >>> >> >>> useful as well as the calculations they’ll enable (ie, dead > bytes > >>> in > >>> >> >>> file, bytes used for overhead, compression ratio achieved, etc). > >>> >> >>> > >>> >> >>> > >>> >> >>> couch_proc_manager - > >>> >> >>> > >>> >> >>> > >>> >> >>> We need to implement the hard ceiling for capping the number of > OS > >>> >> >>> processes. We’ve started seeing a need for this at Cloudant with > >>> some > >>> >> >>> work loads so motivation to fix this is high. The only failing > >>> etap is > >>> >> >>> the assertion of this ceiling. > >>> >> >>> > >>> >> >>> > >>> >> >>> Synchronous db delete on Windows - > >>> >> >>> > >>> >> >>> > >>> >> >>> I did this because running the test suite was driving me > bonkers. I > >>> >> >>> need to ask Dave about how this behaves on Windows (my guess is > not > >>> >> >>> well) but I think we can close things up so that it works better > >>> than > >>> >> >>> the status quo. > >>> >> >> > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Iris Couch > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > NS > >>> > >> > >> > >> > >> -- > >> NS > >> > > > > > > > > -- > > NS > -- NS
