Re: CouchDB 1.0 work

Jan Lehnardt Tue, 29 Apr 2008 02:25:31 -0700

Heya Damien,
On Apr 28, 2008, at 18:27, Damien Katz wrote:

Here are my thoughts on what we need for before we can get toCouchDB 1.0. Feedback please.
Must have:
Incremental reduce: Maybe single biggest outstanding work item.Probably 2 weeks of development to get to a testable state


I obviously agree here, but I guess you are the only one to
make any sensible estimates and all the coding. Which might
not be the best thing for the incubating project. Would it be
possible for you to 'take in an apprentice' that you tug along
while doing that work and bring him up to speed with that part
of the code? This will delay things and it might be impractical
(after all, who should be the apprentice :) and a stupid idea,
but it might make sense to add more people to the code.

Security/Document validation: We need a way to control who canupdate what documents and to validate the updates are correct. Thisis absolutely necessary for offline replication, where replicatedupdates to the database do not come through the application layer.


Do you have any more ideas on who the notion of 'who'
should be defined here? Is that an HTTP-Auth user,
something on the CouchDB level or something entirely
different?

Also, a feature request for the validation function is
to allow modifying the document before saving. It'd
be nice to have and we should keep that in mind while
designing this feature.

View index compaction/management: View indexes currently just grow,need a compaction similar to storage compaction. Also, there is noway to purge old unused indexes, except via the OS.


My comments about the reduce feature apply here equally.

File sync problem: file:sync(), a call that flushes all uncommittedwrites to disk before returning, doesn't work fully or at all on allsome platforms (usually we just lack the flags to tell the OS towrite to disk). Should be fixable by either patching the existingErlang driver source, or using a replacement file driver.


Fixing Erlang sound like the most solid solution here. I did try
to push the patch we had for inets upstream, but my mails never
reached their mailing list and I couldn't be bothered to investigate
because we've switched away from inets. Anyway: We'd need
someone to actively evangelize the patch with the Erlang
maintainers. This persons should be aware of all the implications
this patch introduces. From what I gathered, they generally accept
sensible patches, it just might take some time and the less
interrupting (not braking anything existing) the patch the more
likely it is accepted.

Optimizations. Right now HTTP overhead is huge, with HTTP latency/overhead at about 80% of our document read time when loaded fromlocal client (same machine). Once we can get this down to below 50%, we'll focus on optimizing the databaseand other component. Most core database operations, document reads,updates and view indexing are completely unoptimized so far, whichthe update speed being the biggest complaint.


Jumping past HTTP optimisation:
You mentioned a caching layer based on Erlang's
Judy-tree implementation (is that (d)ets btw?) at
some point. I assume that would speed up everything
that includes disk reads (including updates, who need
to know at least the latest revision of the doc that is
to be updated).

From what I gathered with my config patch is that writing
a key-value storage module is trivial in Erlang, would a
caching system work in the same way?

Testing: We need lots more tests. By the time we ship 1.0, we shouldhave far more test suite code than production code.


This is tedious of course. Maybe we can get together all
devs and everybody who wants to help out to go on a testing
spree to add test-cases and discuss only related issues
for a couple of days to get a bulk of the work done here?

Maybe with nice goals we can proud of reaching afterwards
and all the usual motivation-crap :)

And we need to do load testing. Will the current browser based testsuite can scale for this kind of heavy testing?


I doubt (please prove me wrong) that we could have a
browser create enough load for a single node on reasonably
current hardware.

For load testing we obviously need at least two machines,
one doing the testing and one being tested, better three,
with another one for logging. Testing replication needs
even more machines. And ideally we have even a couple
of client machines to generate the load.

That said, the Ajax of the test suite only works if it is
served from the same host as CouchDB. You can circumvent
that using a proxy but that makes benchmarking harder.

I had some luck testing CouchDB using tsung (http://tsung.erlang-projects.org/)

which is a bit daunting at first but maybe the best tool
to bench and profile Erlang server applications. It can use a
pool of client machines to bench a single server.

And Erlang comes with a built-in profiling and code
coverage solution that we might want to look at for
finding code hot-spots.

In conclusion, testing needs a lot of iron and time and maybe
some vendor can step in here and give us access to a testing
lab. Maybe we can get help from the ASF infrastructure?

Nice to have:
Plugs in: Erlang module plug-in architecture, to make adding newserver side code easy. Right now the code that maps special urls(_view, _compact, _search, etc) to the appropriate Erlang call ismessy and convoluted, and getting worse as we go. We need a standardway to map the special urls to the appropriate Erlang call.
Tail committed database headers: To optimize the updating ofdatabase by reducing the number and length of seeks required, thefile header should be written to the end of the file, rather thanthe beginning. Depending on platform this can remove a full headseekand in the best case scenario a document insert/update can requirezero head seeks (if the head is already positioned at the end of thefile). But this can slow file opening speed as it may need to do asearch in the file for the most recent valid header. In the resultof a crash, the header scan/search cost at database open can belinear or logarithmic, depending on the exact implementation.


Maybe this could be a per-database option?

Clustering: The ability to cluster CouchDB servers, to increase bothreliability (failover-clustering) and client scalability (moreservers to handle more concurrent user load). Clustering does notincrease data scalability, which is (that's partitioning/sharding).

Some zeroconf-based auto-discovery and auto-config of new nodes wouldbe totally kick-ass :)

Selective document purging/compaction: Deletion stubs are keptaround for replication purposes. Need a way to purge the records ofdocument that are old or deleted.

Revision rev path pruning: Each document keeps a list of allprevious revisions. We need a way to prune the oldest records ofdocument revisions and remerge pruned lists during replication.
Don't Need:
Authentication. We can go to 1.0 without authentication, relyinginstead on local proxies to provide authentication.

+1

Partioning. Partitioning is a big project with lots ofconsiderations. It's best to move this post 1.0.

+1

For the must-have: Is my config-patch considered to be accepted once Iget into a shape that

addresses all concerns? Or should that be added to the list?


Cheers
Jan
--

Re: CouchDB 1.0 work

Reply via email to