As I think about it, I'm not surprised you aren't getting better numbers with delayed updates, which amortize the cost of fsync of all the docs being updated per second. But to get half the performance seems wrong. I'm hoping it's something easy to fix, we'll need to run a profiler to be sure.

I'd like to see benchmarks across a variety of loads, and also view build behavior too. For one thing, using full commits on individual doc updates, the new code should be much faster. I also think view refreshes could be slower or faster. Slower because the docs they are mapping are more sparse on disk, but faster because it requires no fsync (if you are using a filesystem that guarantees ordered sequential writes).

Also, if performance generally turns out to be all around slower, we'll have to discuss if the pure tail append change is actually worth it. Maybe we can tail append headers with the old design too, but they are only ever used when the front header is bad. The only problem is, without implementing the current design, I don't know of a workable way to find an valid header vs something that happens to look like a couchdb file header, such as a couchdb file attached inside a document in a live db, or an intentional attack.

-Damien

On May 18, 2009, at 7:43 PM, Chris Anderson wrote:

On Mon, May 18, 2009 at 10:59 AM, Damien Katz <[email protected]> wrote:
Feedback on all this welcome. Please try out the branch to shake out any
bugs or performance problems that might be lurking.


The code looks simpler, which is a nice surprise considering the
storage is actually more robust.

Here are comparative benchmarks on my MacBook. Two runs of
hovercraft:lightning() which factors out all http / json overhead, and
inserts small documents in batches of 1000. I've also done a round of
running my curl/bash benchmark script to insert 100k docs (with
sequential ids)

append only:
2> hovercraft:lightning().
Inserted 100000 docs in 27.614173 seconds with batch size of 1000.
(3621.328800974775 docs/sec)
3> hovercraft:lightning().
Inserted 100000 docs in 27.508795 seconds with batch size of 1000.
(3635.201032978726 docs/sec)

curl/bash: 2285.7 docs/sec

trunk:
2> hovercraft:lightning().
Inserted 100000 docs in 13.237762 seconds with batch size of 1000.
(7554.146992520337 docs/sec)
3> hovercraft:lightning().
Inserted 100000 docs in 13.032335 seconds with batch size of 1000.
(7673.222028132334 docs/sec)

curl/bash: 3417.6 docs/sec

So the preliminary results are that the append-only (on my particular
hardware with a contrived micro-benchmark) is about twice as slow.

It's a matter of priorities. Do we want absolute robustness, or do we
want more performance? Also, the append-only stuff is brand-new and
could conceivably be optimized. I would not be surprised at all to see
it get faster than trunk, with enough tuning.

Chris

--
Chris Anderson
http://jchrisa.net
http://couch.io

Reply via email to