On Mon, Nov 24, 2008 at 9:24 AM, Peter Herndon <[EMAIL PROTECTED]> wrote:
> > Anyway, that's my current use case, and my next use case. I know that > CouchDB isn't finished yet, and hasn't been optimized yet, but does > anyone have any opinions on whether CouchDB would be a reasonable fit > for managing the metadata associated with each object? I think CouchDB is pretty much design with this use case in mind. If you were lucky enough to convince the organization to switch from XML to JSON, the software would pretty much write itself. And CouchDB does a fairly decent job of dealing in XML, as well (using Spidermonkey's E4X engine) so that's not even required. > And, likewise, > would CouchDB be a reasonable fit for managing the binary datastreams? > Would it be practical to store the datastreams in CouchDB itself, and > up to what size limit/throughput limit? CouchDB's attachment support is pretty much designed for this use case (attachments can be multi-GB files, and aren't sent to view servers). >From your description, it sounds like you are maxing out IO at the network level, so it's hard to say how CouchDB would interact with such a stream, without seeing it in action. However, CouchDB's replication and distribution capabilities should make managing multi-site projects as simple as one can hope for. If you shard projects as databases, then you can use replication to make them available on the local network for the various sites, which should make it easier to avoid load bottlenecks at a central repository. > Would it be better to store > the datastreams externally and use CouchDB to manage the metadata and > access control? It's not clear - obviously importing TBs of data from a filesystem to CouchDB will take time and expense, even if CouchDB handles it swimmingly. The nice thing about the schemaless documents is that you can be flexible going forward, maybe referencing some assets via URIs and storing others as attachments. Also, looking down the road, are there plans for > CouchDB's development that would improve its fitness for this purpose > in the future? > Your project sounds like a good fit for CouchDB. Of course, you are talking about working on the high end of the performance / scalability curve, and CouchDB is relatively new, so you'll have to be comfortable as a trail-blazer (not that you'd be the only one, but with a new technology, you'll be in a smaller crowd than if you used something that's been around longer.) I think the biggest positive reason to use CouchDB for your project is the easy of federation / distribution / offline work. Once you've built the business-rules and document format around your project and CouchDB, booting up other instances of the project for more media collections should be straightforward. Because the documents will be more self-contained that what you'd have with a SQL store, for instance, you could build something amenable to merging multiple repositories, or splitting off just a portion of a repository for a particular purpose. This flexibility seems like a big win, as it would allow you to respond to things like datacenter-level bottlenecks with changes that users will understand, such as moving just the necessary sub-collections to a more local server. Good luck and keep us up to date with your progress. Chris -- Chris Anderson http://jchris.mfdz.com