On Tue, Aug 11, 2009 at 12:26 PM, Damien Katz<[email protected]> wrote: > > On Aug 11, 2009, at 3:07 PM, Jens Alfke wrote: > >> >> On Aug 11, 2009, at 10:37 AM, Chris Anderson wrote: >> >>> Since this article, we've changed the header handling, so that we >>> don't keep it at the top of the file, but instead append the header at >>> the end of the file at every commit. The strict append-only nature of >>> the storage engine is the source of it's robustness. Even an extreme >>> action, like truncating the file, will not result in an inconsistent >>> state. >> >> Interesting. Does this really guarantee file integrity even in the case of >> power failure? (I have some experience dealing with file corruption, from >> working on Mac OS X components that use sqlite.) The worst problem is that >> the disk controller will reorder sector writes to reduce seek time, which in >> effect means that if power is lost, some random subset of the last writes >> may not happen. So you won't just end up with a truncated file — you could >> have a file that seems intact and has a correct header at the end, but has >> 4k bytes of garbage somewhere within the last transaction. Does CouchDB's >> file structure guard against that? >> >> My concern with HTML5 local storage is that it's going to be used for >> important user data that cannot be lost, just the way native apps put >> irreplacable data in local files. But the data stores being used to >> implement local storage are much less resilient than the filesystem itself. >> My experience with sqlite is that heavily-used databases on consumer >> machines get corrupted and lost every few months.( This isn't directly >> related to CouchDB itself; but it's why I'm interested in the fault-tolerant >> data store it uses.) >> >>> The other aspect our API that web storage will need to be >>> concurrency-friendly is MVCC. Without MVCC you end up needing long >>> transactions between page-loads, like localStorage currently has, >>> which makes it useless for sharing state between windows. >> >> I'm still not 100% convinced by your analysis in that blog post. A script >> running in a web page will implicitly acquire a lock when it accesses local >> storage, and release the lock at the end of the current event that it's >> handling (i.e. a user action or XHR response.) This is sufficiently >> fine-grained as to not pose a problem, I think. >> >> But Jeremy Orlow pointed out a more problematic case to me — the HML5 >> worker-thread API. Worker threads should be able to access local storage, >> and they don't have an event-based model; so a worker thread will probably >> be within some internal 'while' loop during its entire lifespan. There is >> thus no way to automatically handle transactions for it, so it will have to >> manually acquire and release locks. That means that a buggy or blocked >> worker thread could starve web pages in the same domain from accessing local >> storage. That's bad. >> >>> Maybe the easiest thing would be to just start bundling CouchDB with >>> your browser. :) >> >> In a lot of ways that would be really awesome. However, it would have a >> terrible effect on the download size of the browser, which is an important >> consideration. (IIRC, the all-in-one double-clickable Mac CouchDB package is >> something like 15MB.) > > Things can be made much much smaller. For example it brings in spidermonkey, > all the Erlang libraries plus the most of the ICU library for collation. If > we reused the browsers utf and javascript support, I think we could get > CouchDB + dependencies under a meg compressed. > > -Damien
Well, that's changed my mind. I think the pragmatic thing to do would be to run CouchDB as a browser subprocess and see what happens. > >> >> I like the idea, which I think you proposed, of putting a basic b-tree API >> into the browser, and being able to implement a lite storage system >> compatible with CouchDB on top of it in JS. >> >> —Jens > > -- Chris Anderson http://jchrisa.net http://couch.io
