We are trying to setup a Cassandra cluster and have low read latency 
requirements. Running some tests, we do not see the performance that we were 
hoping for. Wanted to check if anyone has thoughts on:

1.       If these are expected latency times for the data/machine config, etc

2.       If not, can do something to improve our read times?

We set up 4 boxes as a ring running Cassandra 1.1.5, and setup a keyspace with 
replication 3, and strategy_class SimpleStrategy. The column family being 
tested has 12 columns, 4 of which form a composite key.
We then wrote in 192,000 randomly generated test data rows into the column 
family. Most columns are either randomly generated UUIDs, or short strings. One 
of them however is a blob consisting of around 1K data (we later reduced the 
size of this blob data, but didn't seem to change our read times much)

Running a query to like "select * from <table_name> where atag=<foo>", where 
'atag' is the first column of the composite key, from either JDBC or Hector 
(equivalent code), results in read times of 200-300ms from a remote host on the 
same network. The query returned around 800 results. Running the same query on 
a Cassandra host results in a read time of ~110-130 ms.
Using read consistency of ONE reduces the read latency by ~20ms, compared to 
using QUORUM.

Enabling row cache did not seem to change the performance much. Moreover, the 
row cache 'size' according to nodetool was very tiny. Here is a snapshot of the 
nodetool info after running few read tests:
Key Cache        : size 2448 (bytes), capacity 104857584 (bytes), 231 hits, 266 
requests, 1.000 recent hit rate, 14400 save period in seconds
Row Cache        : size 96 (bytes), capacity 4194304000 (bytes), 9 hits, 13 
requests, NaN recent hit rate, 0 save period in seconds

Hardware/OS specs:
Intel(R) Xeon(R) CPU L5640
OS: Solaris 5.10
RAM: 32 GB
Hard disk: 1 TB disk magnetic (not SSD)

Thanks,
Arindam

Reply via email to