Completed Stress Test

Keith Turner Wed, 24 May 2017 11:49:53 -0700

I just completed a successful stress test[3] run in prep for 1.1.0
release. Commit f0403a1 was used for the test. I sent a tweet[1] out
about it and I am providing more details here.


For the test I loaded an initial 1 billion random numbers via map
reduce.  Then I incrementally loaded 100 sets of 1 million random
numbers.  All random numbers were between 0 and 1 trillion.

Twelve m3.xlarge EC2 nodes were used.  Ten worker nodes.  One leader
node running ZK, NN, Accumulo master, and RM.  One node ran Grafana
and InfluxDB.  Muchos[2] was used to set it all up.

At the end of the test I looked at the next Oracle timestamp and
divided this by two to get an estimate of the number of transactions.
Based on this I think ~409 million transactions were executed during
the test run.  There were constant transaction collisions during the
test between 50 and 150 per second, the test is designed to produce
these.

Below is the final output from the test.

    *****Verifying Fluo & MapReduce results match*****
    Success! Fluo & MapReduce both calculated 1099396113 unique integers

Below is a summary of settings that were changed for performance.

I set the following Accumulo table settings

    table.durability=flush
    table.compaction.major.ratio=1.5
    table.file.compress.blocksize=8K
    table.file.compress.blocksize.index=32K
    table.file.compress.type=snappy
    table.file.replication=2

I set the following Accumulo system settings

    tserver.cache.data.size=4536M
    tserver.cache.index.size=512M
    tserver.memory.maps.max=512M

I set the following Fluo props

    fluo.worker.num.threads=128
    fluo.loader.num.threads=32
    fluo.loader.queue.size=64
    fluo.yarn.worker.max.memory.mb=3584
    fluo.yarn.worker.instances=10
    io.fluo.impl.tx.commit.memory=41943040

For HDFS, local reads were enabled.  This makes a noticeable
difference in transactions per second.  I also ran a compaction after
the initial bulk import of a billion numbers to get ensure data was
local.

Watching Accumulo's monitor page, index cache hits were always at
100%.  Data cache oscillated  between 90% and 100% hit rate.  I think
as transactions worked on data higher up the tree, the hit rate went
up.

Accumulo 1.7.3, Zookeeper 3.4.9, Hadoop 2.7.2, Centos 7, and OpenJDK 8
(from Centos) were used to run the test.

[1]: https://twitter.com/ApacheFluo/status/867425323742351360
[2]: https://github.com/astralway/muchos/
[3]: https://github.com/astralway/stresso

Completed Stress Test

Reply via email to