On Aug 11, 2009, at 3:07 PM, Jens Alfke wrote:
On Aug 11, 2009, at 10:37 AM, Chris Anderson wrote:
Since this article, we've changed the header handling, so that we
don't keep it at the top of the file, but instead append the header
at
the end of the file at every commit. The strict append-only nature of
the storage engine is the source of it's robustness. Even an extreme
action, like truncating the file, will not result in an inconsistent
state.
Interesting. Does this really guarantee file integrity even in the
case of power failure? (I have some experience dealing with file
corruption, from working on Mac OS X components that use sqlite.)
The worst problem is that the disk controller will reorder sector
writes to reduce seek time, which in effect means that if power is
lost, some random subset of the last writes may not happen. So you
won't just end up with a truncated file — you could have a file that
seems intact and has a correct header at the end, but has 4k bytes
of garbage somewhere within the last transaction. Does CouchDB's
file structure guard against that?
My concern with HTML5 local storage is that it's going to be used
for important user data that cannot be lost, just the way native
apps put irreplacable data in local files. But the data stores being
used to implement local storage are much less resilient than the
filesystem itself. My experience with sqlite is that heavily-used
databases on consumer machines get corrupted and lost every few
months.( This isn't directly related to CouchDB itself; but it's why
I'm interested in the fault-tolerant data store it uses.)
The other aspect our API that web storage will need to be
concurrency-friendly is MVCC. Without MVCC you end up needing long
transactions between page-loads, like localStorage currently has,
which makes it useless for sharing state between windows.
I'm still not 100% convinced by your analysis in that blog post. A
script running in a web page will implicitly acquire a lock when it
accesses local storage, and release the lock at the end of the
current event that it's handling (i.e. a user action or XHR
response.) This is sufficiently fine-grained as to not pose a
problem, I think.
But Jeremy Orlow pointed out a more problematic case to me — the
HML5 worker-thread API. Worker threads should be able to access
local storage, and they don't have an event-based model; so a worker
thread will probably be within some internal 'while' loop during its
entire lifespan. There is thus no way to automatically handle
transactions for it, so it will have to manually acquire and release
locks. That means that a buggy or blocked worker thread could starve
web pages in the same domain from accessing local storage. That's bad.
Maybe the easiest thing would be to just start bundling CouchDB with
your browser. :)
In a lot of ways that would be really awesome. However, it would
have a terrible effect on the download size of the browser, which is
an important consideration. (IIRC, the all-in-one double-clickable
Mac CouchDB package is something like 15MB.)
Things can be made much much smaller. For example it brings in
spidermonkey, all the Erlang libraries plus the most of the ICU
library for collation. If we reused the browsers utf and javascript
support, I think we could get CouchDB + dependencies under a meg
compressed.
-Damien
I like the idea, which I think you proposed, of putting a basic b-
tree API into the browser, and being able to implement a lite
storage system compatible with CouchDB on top of it in JS.
—Jens