Another thing: reliable versioning (we talked about this on IRC).
Basically, at the moment, old revisions might get lost during
compaction or whatever.
You mentioned you were planning a relatively trivial change where on
an update, the old document becomes an attachment, so you'd
essentially have a list of old document revisions.
Just thought I'd bring that up ;)
Stay strong, great work so far,
David
P.S: any plans on a timeline for 1.0? 2008? 2009? :)
Am 28.04.2008 um 18:27 schrieb Damien Katz:
Here are my thoughts on what we need for before we can get to
CouchDB 1.0. Feedback please.
Must have:
Incremental reduce: Maybe single biggest outstanding work item.
Probably 2 weeks of development to get to a testable state
Security/Document validation: We need a way to control who can
update what documents and to validate the updates are correct. This
is absolutely necessary for offline replication, where replicated
updates to the database do not come through the application layer.
View index compaction/management: View indexes currently just grow,
need a compaction similar to storage compaction. Also, there is no
way to purge old unused indexes, except via the OS.
File sync problem: file:sync(), a call that flushes all uncommitted
writes to disk before returning, doesn't work fully or at all on all
some platforms (usually we just lack the flags to tell the OS to
write to disk). Should be fixable by either patching the existing
Erlang driver source, or using a replacement file driver.
Optimizations. Right now HTTP overhead is huge, with HTTP latency/
overhead at about 80% of our document read time when loaded from
local client (same machine). Once we can get this down to below 50%,
we'll focus on optimizing the database and other component. Most
core database operations, document reads, updates and view indexing
are completely unoptimized so far, which the update speed being the
biggest complaint.
Testing: We need lots more tests. By the time we ship 1.0, we should
have far more test suite code than production code. And we need to
do load testing. Will the current browser based test suite can scale
for this kind of heavy testing?
Nice to have:
Plugs in: Erlang module plug-in architecture, to make adding new
server side code easy. Right now the code that maps special urls
(_view, _compact, _search, etc) to the appropriate Erlang call is
messy and convoluted, and getting worse as we go. We need a standard
way to map the special urls to the appropriate Erlang call.
Tail committed database headers: To optimize the updating of
database by reducing the number and length of seeks required, the
file header should be written to the end of the file, rather than
the beginning. Depending on platform this can remove a full headseek
and in the best case scenario a document insert/update can require
zero head seeks (if the head is already positioned at the end of the
file). But this can slow file opening speed as it may need to do a
search in the file for the most recent valid header. In the result
of a crash, the header scan/search cost at database open can be
linear or logarithmic, depending on the exact implementation.
Clustering: The ability to cluster CouchDB servers, to increase both
reliability (failover-clustering) and client scalability (more
servers to handle more concurrent user load). Clustering does not
increase data scalability, which is (that's partitioning/sharding).
Selective document purging/compaction: Deletion stubs are kept
around for replication purposes. Need a way to purge the records of
document that are old or deleted.
Revision rev path pruning: Each document keeps a list of all
previous revisions. We need a way to prune the oldest records of
document revisions and remerge pruned lists during replication.
Don't Need:
Authentication. We can go to 1.0 without authentication, relying
instead on local proxies to provide authentication.
Partioning. Partitioning is a big project with lots of
considerations. It's best to move this post 1.0.