I just completed a successful stress test[3] run in prep for 1.1.0
release. Commit f0403a1 was used for the test. I sent a tweet[1] out
about it and I am providing more details here.
For the test I loaded an initial 1 billion random numbers via map
reduce. Then I incrementally loaded 100 sets of 1 million random
numbers. All random numbers were between 0 and 1 trillion.
Twelve m3.xlarge EC2 nodes were used. Ten worker nodes. One leader
node running ZK, NN, Accumulo master, and RM. One node ran Grafana
and InfluxDB. Muchos[2] was used to set it all up.
At the end of the test I looked at the next Oracle timestamp and
divided this by two to get an estimate of the number of transactions.
Based on this I think ~409 million transactions were executed during
the test run. There were constant transaction collisions during the
test between 50 and 150 per second, the test is designed to produce
these.
Below is the final output from the test.
*****Verifying Fluo & MapReduce results match*****
Success! Fluo & MapReduce both calculated 1099396113 unique integers
Below is a summary of settings that were changed for performance.
I set the following Accumulo table settings
table.durability=flush
table.compaction.major.ratio=1.5
table.file.compress.blocksize=8K
table.file.compress.blocksize.index=32K
table.file.compress.type=snappy
table.file.replication=2
I set the following Accumulo system settings
tserver.cache.data.size=4536M
tserver.cache.index.size=512M
tserver.memory.maps.max=512M
I set the following Fluo props
fluo.worker.num.threads=128
fluo.loader.num.threads=32
fluo.loader.queue.size=64
fluo.yarn.worker.max.memory.mb=3584
fluo.yarn.worker.instances=10
io.fluo.impl.tx.commit.memory=41943040
For HDFS, local reads were enabled. This makes a noticeable
difference in transactions per second. I also ran a compaction after
the initial bulk import of a billion numbers to get ensure data was
local.
Watching Accumulo's monitor page, index cache hits were always at
100%. Data cache oscillated between 90% and 100% hit rate. I think
as transactions worked on data higher up the tree, the hit rate went
up.
Accumulo 1.7.3, Zookeeper 3.4.9, Hadoop 2.7.2, Centos 7, and OpenJDK 8
(from Centos) were used to run the test.
[1]: https://twitter.com/ApacheFluo/status/867425323742351360
[2]: https://github.com/astralway/muchos/
[3]: https://github.com/astralway/stresso