On May 18, 2009, at 7:43 PM, Chris Anderson wrote:
On Mon, May 18, 2009 at 10:59 AM, Damien Katz <[email protected]>
wrote:
Feedback on all this welcome. Please try out the branch to shake
out any
bugs or performance problems that might be lurking.
The code looks simpler, which is a nice surprise considering the
storage is actually more robust.
Here are comparative benchmarks on my MacBook. Two runs of
hovercraft:lightning() which factors out all http / json overhead, and
inserts small documents in batches of 1000. I've also done a round of
running my curl/bash benchmark script to insert 100k docs (with
sequential ids)
append only:
2> hovercraft:lightning().
Inserted 100000 docs in 27.614173 seconds with batch size of 1000.
(3621.328800974775 docs/sec)
3> hovercraft:lightning().
Inserted 100000 docs in 27.508795 seconds with batch size of 1000.
(3635.201032978726 docs/sec)
curl/bash: 2285.7 docs/sec
trunk:
2> hovercraft:lightning().
Inserted 100000 docs in 13.237762 seconds with batch size of 1000.
(7554.146992520337 docs/sec)
3> hovercraft:lightning().
Inserted 100000 docs in 13.032335 seconds with batch size of 1000.
(7673.222028132334 docs/sec)
curl/bash: 3417.6 docs/sec
So the preliminary results are that the append-only (on my particular
hardware with a contrived micro-benchmark) is about twice as slow.
It's a matter of priorities. Do we want absolute robustness, or do we
want more performance? Also, the append-only stuff is brand-new and
could conceivably be optimized. I would not be surprised at all to see
it get faster than trunk, with enough tuning.
I figured out the problem, it's because now each document body was
getting compressed before writing to disk: term_to_binary(Term,
[compressed])
Previously, only btree nodes were saved compressed and docs were not.
I didn't realize the compression was so expensive, but now that I
switch it off on both the branch and on trunk, I see big performance
boosts for both. And now the tail append stuff is slightly faster on
my machine.
-Damien