Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for 
change notification.

The "ReleaseNotices" page has been changed by JanLehnardt:
http://wiki.apache.org/couchdb/ReleaseNotices

Comment:
new release notices page

New page:
<<Include(EditTheWiki)>>

= Release Notices =

Sometimes, after we make a release, we might find out that something is wrong 
with it that is so severe that we need to tell everyone who runs that release. 
This page collects these notices.

<<TableOfContents(2)>>

== 1.0.0 ==

**A 1.0.0 RECOVERY TOOL IS NOW AVAILABLE**

Download the [[http://wiki.couchone.com/page/repair-tool#/|CouchDB 1.0.0 Repair 
Tool]] to recover data.


=== Notes on a Nasty Bug ===

Developers should be using 1.0.1 release only at this point; not the 1.0.0 
version. Read on to find out why.

On the weekend of August 7th–8th, 2010 we discovered and fixed a bug in CouchDB 
1.0.0. The problem was subtle (cancelling a timer, without deleting the 
reference to it) but the ramifications were not: there was potential data loss 
for users of 1.0.0. The 1.0.1 release contains a permanent fix, and [is 
available now on the download page](../downloads.html).

We are proud how quickly the CouchDB community recovered from this bug and went 
the extra mile to make sure everyone's data was safe. It is clear we have a 
group of developers who care enough about all users' data that it aggressively 
pursued an "edge case" bug so no one would be caught off guard. Further, the 
team worked for the next week to create a repair tool to recover access to data 
which was affected by the bug. As a result, no users lost data permanently. 
Kudos!

=== The Remedy ===

For current users, these instructions will ensure your data is safe. First: 
**do not restart your CouchDB!** The hot fix involves changing configuration on 
the running server, so have your admin credentials handy  (if your CouchDB is 
in Admin Party mode with no admins defined, you won't need admin credentials). 
(If you do not have admin credentials, but you can restart the server, you can 
still prevent data loss. Read on.)

==== If you have admin credentials (or if your CouchDB is in Admin Party mode) 
====

Visit the Futon admin console at http://yourserver:5984/_utils/, and click 
"Login" in the lower right hand corner. Login as an administrator, and visit 
the "Configuration" page linked in the sidebar: 
http://yourserver:5984/_utils/config.html

Now that you are in the configuration page, set `delayed_commits` (in the 
`couchdb` section) to `false`. You can do this by clicking on the word `true`, 
and replacing it with false, and hitting enter.

The next time you write a document to each database, it will commit the header 
to disk, and your data will be secure. For safety, please continue with the 
next set of instructions.

==== For everyone ====

To ensure that each database is committed, you can use the 
`_ensure_full_commit` command. There are a few of ways to do this.

The simplest method is to right click the following link and add it to your 
bookmarks.

Bookmarklet: 
[[javascript:%%24.couch.allDbs%%28%%7Bsuccess%%3Afunction%%28dbs%%29%%7Bfunction%%20commitDbs%%28list%%29%%7Bvar%%20db%%3Dlist.pop%%28%%29%%3B%%24.ajax%%28%%7Btype%%3A%%22POST%%22%%2Curl%%3A%%22%%2F%%22%%2BencodeURIComponent%%28db%%29%%2B%%22%%2F_ensure_full_commit%%22%%2CcontentType%%3A%%22application%%2Fjson%%22%%2CdataType%%3A%%22json%%22%%2Ccomplete%%3Afunction%%28r%%29%%7B%%24%%28%%22%%23content%%22%%29.prepend%%28%%27%%3Cul%%20id%%3D%%22commit_all%%22%%3E%%3C%%2Ful%%3E%%27%%29%%3Bif%%28r.status%%3D%%3D201%%29%%7B%%24%%28%%22%%23commit_all%%22%%29.append%%28%%27%%3Cli%%3Ecommitted%%3A%%20%%27%%2Bdb%%2B%%27%%3C%%2Fli%%3E%%27%%29%%3B%%7Delse%%7B%%24%%28%%22%%23commit_all%%22%%29.append%%28%%27%%3Cli%%20style%%3D%%22color%%3Ared%%3B%%22%%3Eerror%%3A%%20%%27%%2Bdb%%2B%%27%%3C%%2Fli%%3E%%27%%29%%3B%%7Dif%%28list.length%%3E0%%29%%7BcommitDbs%%28list%%29%%3B%%7D%%7D%%7D%%29%%3B%%7DcommitDbs%%28dbs%%29%%3B%%7D%%7D%%29%%3B|Commit
 All Databases]]

Now visit Futon on your CouchDB instance at http://localhost:5984/_utils/, and 
select the bookmark. It will use the !JavaScript libraries included with Futon 
to ensure all your databases are fully committed.

Alternatively, here is a simple HTML file that you can upload to your CouchDB 
using Futon. When you visit it, it will make sure your data is all safely 
committed. If you prefer a shell script, skip below this file.

Save this HTML to a file on your machine called `commit_all.html`

{{{
    <!DOCTYPE html>
    <html>
      <head><title>Commit All Databases</title></head>
      <body>
        <h1>Commit All Databases</h1>
        <p>This script will trigger <tt>_ensure_full_commit</tt> on all 
databases.</p>
        <ul id="databases"></ul>
      </body>
      <script src="/_utils/script/jquery.js"></script>
      <script src="/_utils/script/jquery.couch.js"></script>
      <script>
        $.couch.allDbs({
          success : function(dbs) {
            dbs.forEach(function(db) {
              $.ajax({
                type: "POST", url: "/" + encodeURIComponent(db) + 
"/_ensure_full_commit",
                contentType: "application/json", dataType: "json",
                complete : function(r) {
                  if (r.status == 201) {
                    $("#databases").append('<li>committed: '+db+'</li>');
                  } else {
                    $("#databases").append('<li style="color:red;">error: 
'+db+'</li>');
                  }
                }
              });
            });
          }
        });
      </script>
    </html>
}}}

Now browse to your CouchDB's Futon at http://localhost:5984/_utils/ and create 
a database. Now visit that database, and create a document, and save it. Now 
click the button labeled "Upload Attachment" and choose the `commit_all.html` 
file you just created, and upload it. A link to that HTML file will appear in 
Futon.

Now click the link in Futon for `commit_all.html`, and it will run 
`_ensure_full_commit` on all of your databases.

If you prefer a shell script, 
[[http://wiki.couchone.com/page/ensure-full_commit-sh|this will also commit all 
your databases]].

At this point your data is safe.

==== If you don't have admin credentials ====

**Warning:** make sure you followed the instructions in the above section "For 
everyone" before you do the rest of these steps. If you were able to log into 
CouchDB as an administrator (and complete the first section, before "For 
Everyone") than you can skip this section.

In this step we will configure your CouchDB so that future updates will be 
durable.

Did you run the above HTML script? Do that now, or the next action may destroy 
data.

Now, find CouchDB's configuration file. It will be called `local.ini` and it is 
probably in a locations like: `/usr/local/etc/couchdb/local.ini`

Open the file, and add the following lines to it:

{{{
    [couchdb]
    delayed_commits = false
}}}

Now, restart your CouchDB. This will be different on different operating 
systems. If you have your CouchDB configured as a system service, restarting 
the computer will do the trick, but if you don't want to do that, you can 
probably find the pid of CouchDB, by running `ps ax | grep couchdb`. Once you 
have the pid, you can kill CouchDB by running `kill <pid>`. If you are a fan of 
magic, you can do all that in one ninja move by running:

{{{
      kill `ps ax | grep couchdb | head -n1 | awk '{print $1}'`
}}}

Note: you might need to sudo.

Once CouchDB is killed, the system should bring it back up. When it boots, it 
will load the config for `delayed_commits = false` so updates from that point 
forward will be durable.

=== The Bug ===

Now that we have you fixed up, you might enjoy a look at the technicalities of 
what got broken in CouchDB.

A commit is what causes writes to become durably flushed to storage. It is an 
expensive operation. During a commit, recent writes are flushed to disk and a 
new database header is written. Finally, the new header is also flushed to 
disk. At the operating system level this involves multiple fsync() calls to 
ensure data has been fully written.

Delayed commits are a feature of CouchDB that allows it to achieve better write 
performance for some workloads while sacrificing a small amount of durability. 
The setting causes CouchDB to wait up to a full second before committing new 
data after an update. If the server crashes before the header is written then 
any writes since the last commit are lost. The choice of delayed commits as a 
default has been discussed many times and the consensus was that they should 
remain on for the 1.0 release.

For each open database in CouchDB there is an Erlang process referred to as the 
update process, the source for which is in a file called 
`couch_db_updater.erl`. All writes to a given database pass through the 
corresponding update process. This process is in charge of preparing, writing 
and committing batches of updates. In order to provide delayed commits, the 
update process sets a timer for one second in the future. When the timer 
expires a commit message is sent back to the updater. A reference to this timer 
is kept in the updater state. This reference prevents the updater from 
scheduling excessive commit messages when one is already pending.

In the updater code that shipped with 1.0 a delayed commit message that arrived 
when there were no pending writes never cleared the timer reference. As a 
result, the updater state erroneously indicated that there was a future commit 
scheduled. Once in this bad state the updater would never schedule another 
commit. In practice, this problem occurred when a write conflict was followed 
by a period of inactivity. The conflicting write triggered the delayed commit, 
but when the commit message arrived no new data needed to be written and the 
timer reference was not cleared. This scenario is thankfully unlikely to occur 
in a busy database.

=== Mixups and Fixes ===

One can never say exactly what lead to a particular bug.  In this case, there 
were some contributing factors.

==== Release procedure ====

In the run-up to 1.0, there was some confusion about which branch would 
ultimately become 1.0. Originally we'd discussed branching 1.0 from the 0.11.x 
line, as 0.11 was a feature freeze release, so that we could concentrate on 
bugs and performance for 1.0. However, as we approached 1.0's release, there 
was very little work in trunk that involved new features. And the few features 
added to trunk were really just refinements of existing functionality, to make 
it more user friendly, etc.

So in the final weeks before 1.0's release, we decided to cut it from trunk (as 
opposed to from the 0.11.x branch) as that would make for more straightforward 
code management in the future. It has also been our release policy since the 
early days of the project.

As a result the commit that introduced the bug went into trunk when 0.11.x was 
still designated to become the 1.0 release with the intention to have it prove 
its stability before a future 1.1 release. After we decided to cut 1.0 from 
trunk, this commit didn't get the necessary review to stay in the 1.0 release 
branch.

The fix here is that we are now crystal clear that future releases will always 
be cut from trunk. So if people are committing stuff that they feel is not 
baked enough for trunk, those commits will be more likely done in a feature 
branch. Keeping clear about this is one way we can avoid similar issues in the 
future.

==== Code review ====

In the run up to 1.0, there were mailing list messages about which commits were 
trivial, and which needed review. In the case of the commits that weren't 
trivial, the original committer was the one who said he thought they were fine. 
In the future, for any commits to the deepest parts of the storage engine, we 
will be careful to have review from multiple parties. Many eyes make bugs 
shallow, but for code like the core CouchDB storage engine, there aren't a lot 
of folks who are ready to review and understand a particular patch.

==== Testing ====

CouchDB currently has a suite of unit and integration tests, which guide 
development and provide the first line of documentation. We also have a few 
independent benchmark suites, which we can use to track performance 
improvements and regressions.

What we don't have is a set of correctness stress tests. In this case, a 
fuzzing test, that applies a random set of operations to a constrained 
keyspace, while tracking the expected database state, and then restarting the 
server to make sure the state is as expected, would have caught the error.

We could learn a lot from the [[http://www.sqlite.org/testing.html|SQLite 
testing methodology]]. Expect to see more stress and correctness tests in 
CouchDB's future.

Reply via email to