It's known that compaction hurts the node performance so that it might
miss some requests. That's why it's important to handle these
situations and the client needs to retry the operation into another
working host. We have been storing performance data from each
cassandra request which we do into our five node cassandra production
cluster.

We log the retry count and request type into our data warehouse
solution and I've now extracted the data from a 10 day period and
calculated how many retry requests is needed so that the results can
be obtained. The following chart tells how many time an operation had
to be retried until it was successfully completed. The percents tells
the probability like that "the request will be successful with the
first try in 99.933 % times."

Total amount of operations: 94 682 251 within 10 days.

Retry times | operations | percentage from total operations
          0 |  94618468  | 99.93263 %
          1 |     56688  |  0.05987 %
          2 |      5018  |  0.00529 %
          3 |      1359  |  0.00144 %
          4 |       111  |  0.00012 %
          5 |       25   |  0.00003 %

There were also few operations which needed more than five retries, so
preparing to try up to ten times is not a bad idea.

The cluster users 0.6.5 with RF=3. Each operation is executed until it
succeeds or until 10 retries using this php wrapper
http://github.com/dynamoid/cassandra-utilities

Have others found similar results? Please discuss :)

 - Juho Mäkinen

Reply via email to