Re: Detailed info on the B-tree store? Native implementations thereof?

Jan Lehnardt Tue, 11 Aug 2009 16:15:56 -0700


On 12 Aug 2009, at 01:08, Robert Newson wrote:

The worst problem is that the disk controller will reorder sectorwrites to reduce seek time, which in effect means that if power islost, some random subset of the last writes may not happen. So youwon't just end up with a truncated file
But what about that issue? It think it's tolerated because couchdb
searches backward for the last non-corrupt header, right?

find_header(_Fd, -1) ->
   no_valid_header;
find_header(Fd, Block) ->
   case (catch load_header(Fd, Block)) of
   {ok, Bin} ->
       {ok, Bin};
   _Error ->
       find_header(Fd, Block -1)
   end.


Jens' next sentence was:

So you won't just end up with a truncated file — you could have afile that seems intact and has a correct header at the end, but has4k bytes of garbage somewhere within the last transaction. DoesCouchDB's file structure guard against that?


As far as I understand the file format, we're not safe against that.

Cheers
Jan
--

On Tue, Aug 11, 2009 at 6:37 PM, Chris Anderson<[email protected]>wrote:
On Tue, Aug 11, 2009 at 8:43 AM, Jens Alfke<[email protected]>wrote:
I'm interested in the underpinnings of the CouchDB server — thecrash-proofconcurrent B-tree store. There's a blog post linked to in the wikithatdescribes the basic concepts (leaves and updated intermediatenodes areappended to the file; the start of the file stores two links tothe rootnode) but is there any more detailed description[1]? And is thereanysimilar technology available that's implemented in native code (C/C++)?[2]
Jens,

Glad to have you interested. There are a few posts about the B-tree
store. This is probably the best, but slightly out of date:

http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Since this article, we've changed the header handling, so that we
don't keep it at the top of the file, but instead append the headerat
the end of the file at every commit. The strict append-only nature of
the storage engine is the source of it's robustness. Even an extreme
action, like truncating the file, will not result in an inconsistent
state.

The other aspect our API that web storage will need to be
concurrency-friendly is MVCC. Without MVCC you end up needing long
transactions between page-loads, like localStorage currently has,
which makes it useless for sharing state between windows. As I
analyzed in that blog post, once you have CouchDB-style MVCC tokens,
you pretty much need to start dealing in documents to manage the {id,
rev} tuple.

Maybe the easiest thing would be to just start bundling CouchDB with
your browser. :)

I'll be living in Berkeley starting next month, so if you'd like to
get together perhaps I can help get you oriented in the source codeso
you can see this stuff in action, yourself. Erlang is surprisingly
simple once you get started.

Chris
Basically I'm interested in whether it's feasible to build asimple storagesystem (for use in an HTML5 Web browser) that a CouchDB-compatibleclient
library could be built on top of. JChris has posted about this topic
recently[3], and pointed out that the hashtable-oriented key-valuestorecurrently speced in HTML5 is a poor match for CouchDB. Moreover,the SQLitedatabase engine underneath it doesn't guarantee data integrityafter a hardsystem crash (as I know from painful experience.) So: could webuild afault-tolerant B-tree based API into the browser? (This isn't justacademiccuriosity: I recently started work on the Chrome team at Google,and HTML5
local storage is one of my group's responsibilities.)

Thanks!

—Jens
[1] Alas, I cannot Use The Source, Luke, as I do not have Erlangskillz. :([2] I know of many, many B-tree libraries (Berkeley DB,TokyoCabinet...) but
none that are fault-tolerant.
[3] http://jchrisa.net/drl/_design/sofa/_show/post/Fixing-HTML-5-Storage
--
Chris Anderson
http://jchrisa.net
http://couch.io

Re: Detailed info on the B-tree store? Native implementations thereof?

Reply via email to