Re: Detailed info on the B-tree store? Native implementations thereof?

Jens Alfke Tue, 11 Aug 2009 12:08:31 -0700


On Aug 11, 2009, at 10:37 AM, Chris Anderson wrote:

Since this article, we've changed the header handling, so that we
don't keep it at the top of the file, but instead append the header at
the end of the file at every commit. The strict append-only nature of
the storage engine is the source of it's robustness. Even an extreme
action, like truncating the file, will not result in an inconsistent
state.

Interesting. Does this really guarantee file integrity even in thecase of power failure? (I have some experience dealing with filecorruption, from working on Mac OS X components that use sqlite.) Theworst problem is that the disk controller will reorder sector writesto reduce seek time, which in effect means that if power is lost, somerandom subset of the last writes may not happen. So you won't just endup with a truncated file — you could have a file that seems intact andhas a correct header at the end, but has 4k bytes of garbage somewherewithin the last transaction. Does CouchDB's file structure guardagainst that?

My concern with HTML5 local storage is that it's going to be used forimportant user data that cannot be lost, just the way native apps putirreplacable data in local files. But the data stores being used toimplement local storage are much less resilient than the filesystemitself. My experience with sqlite is that heavily-used databases onconsumer machines get corrupted and lost every few months.( This isn'tdirectly related to CouchDB itself; but it's why I'm interested in thefault-tolerant data store it uses.)

The other aspect our API that web storage will need to be
concurrency-friendly is MVCC. Without MVCC you end up needing long
transactions between page-loads, like localStorage currently has,
which makes it useless for sharing state between windows.

I'm still not 100% convinced by your analysis in that blog post. Ascript running in a web page will implicitly acquire a lock when itaccesses local storage, and release the lock at the end of the currentevent that it's handling (i.e. a user action or XHR response.) This issufficiently fine-grained as to not pose a problem, I think.

But Jeremy Orlow pointed out a more problematic case to me — the HML5worker-thread API. Worker threads should be able to access localstorage, and they don't have an event-based model; so a worker threadwill probably be within some internal 'while' loop during its entirelifespan. There is thus no way to automatically handle transactionsfor it, so it will have to manually acquire and release locks. Thatmeans that a buggy or blocked worker thread could starve web pages inthe same domain from accessing local storage. That's bad.

Maybe the easiest thing would be to just start bundling CouchDB with
your browser. :)

In a lot of ways that would be really awesome. However, it would havea terrible effect on the download size of the browser, which is animportant consideration. (IIRC, the all-in-one double-clickable MacCouchDB package is something like 15MB.)

I like the idea, which I think you proposed, of putting a basic b-treeAPI into the browser, and being able to implement a lite storagesystem compatible with CouchDB on top of it in JS.


—Jens

Re: Detailed info on the B-tree store? Native implementations thereof?

Reply via email to