Hallo,
I am a student currently working on my thesis, for which I am
benchmarking several NoSQL database. One of the benchmarks I ran on
couchdb show some unexpected/counter-intuitive results. I included a
graph in de attachment.
The benchmark set-up is as follows:
* A three node couchdb cluster with replication links between each node
in both directions.
* All update/insert operation are send to the same couchdb node
* All Read/scan (range queries) operations are load balanced over all
three nodes (round robin)
* Ektorp is used as the java client library
* The scan operation is implemented by querying the view "_all_docs"
with a certain startkey and a limit of 100 documents.
De benchmark consists of four parts:
* The first five minutes are warm-up. (not shown in the graph)
* Between timepoint 5min. and 10min. in the graph all nodes are up and
running normally.
* Between 10min and 15min a firewall rule is inserted on one of the
(read) nodes which prevent incoming and outgoing couchdb traffic (port
5984).
* From 15min. on, this firewall rule is removed again. So we're back to
normal operation.
The purpose of the firewall rule in the middle of the benchmark is to
simulate network-failure. The benchmark test examines how the couchdb
database reacts to a network partition by looking at the latency of the
different operations in time.
The weird thing is: the graph show that read and update operations are
getting significantly slower when the firewall rule is present, while
the insert and scan operation doesn't seem to feel an increase in
latency. The node where the firewall rule is present, is only used for
read and scan operation. So, normally only the read and scan operations
should have more latency. The latency of the other operations should
stay stable. I did the same benchmark using several other NoSQL
databases which show no such behaviour as the one we see here. I closely
monitored the behaviour of couchdb, but I didn't found an explanation
for this phenomena. So, I think it has something to do with the
architecture of couchdb. Can anyone help me with an architectural
explanation, which explains why this behaviour is showing up?
Thanks in advance,
Arnaud Schoonjans