Here are some calculated 'latency' results reported by cassandra-stress when asked to write 10M rows, i.e. cassandra-stress -d <ip1>,<ip2> -n 10000000 (we actually had cassandra-stress running in deamon mode for the below tests)
avg_latency (percentile) 90 99 99.9 99.99 Write: 8 cores, 32 GB, 3-disk RAID 0 0.002982182 0.003963931 0.004692996 0.004792326 Write: 32 cores, 128 GB, 7-disk RAID 0 0.003157515 0.003763181 0.005184429 0.005441946 Read: 8 cores, 32 GB, 3-disk RAID 0 0.002289879 0.057178021 0.173753058 0.24386912 Read: 32 cores, 128 GB, 7-disk RAID 0 0.002317525 0.010937648 0.013205977 0.014270511 The client was another node on the same network with the 8 core, 32 GB RAM specs. I wouldn't expect it to bottleneck, but I can monitor it while generating the load. In general, what would you expect it to bottleneck at? >> Another interesting thing is that the linux disk cache doesn't seem to be >> growing in spite of a lot of free memory available. >Things will only get paged in when they are accessed. Hmm, interesting. I did a test where I just wrote large files to disk, eg. dd if=/dev/zero of=bigfile18 bs=1M count=10000 and checked the disk cache, and it increased by exactly the same size of the file written (no reads were done in this case) -----Original Message----- From: Aaron Morton [mailto:aa...@thelastpickle.com] Sent: Monday, November 25, 2013 11:55 AM To: Cassandra User Subject: Re: Config changes to leverage new hardware > However, for both writes and reads there was virtually no difference in the > latencies. What sort of latency were you getting ? > I'm still not very sure where the current *write* bottleneck is though. What numbers are you getting ? Could the bottle neck be the client ? Can it send writes fast enough to saturate the nodes ? As a rule of thumb you should get 3,000 to 4,000 (non counter) writes per second per core. > Sample iostat data (captured every 10s) for the dedicated disk where commit > logs are written is below. Does this seem like a bottle neck? Does not look too bad. > Another interesting thing is that the linux disk cache doesn't seem to be > growing in spite of a lot of free memory available. Things will only get paged in when they are accessed. Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 21/11/2013, at 12:42 pm, Arindam Barua <aba...@247-inc.com<mailto:aba...@247-inc.com>> wrote: > > Thanks for the suggestions Aaron. > > As a follow up, we ran a bunch of tests with different combinations of these > changes on a 2-node ring. The load was generated using cassandra-stress, run > with default values to write 30 million rows, and read them back. > However, for both writes and reads there was virtually no difference in the > latencies. > > The different combinations attempted: > 1. Baseline test with none of the below changes. > 2. Grabbing the TLAB setting from 1.2 > 3. Moving the commit logs too to the 7 disk RAID 0. > 4. Increasing the concurrent_read to 32, and concurrent_write to 64 > 5. (3) + (4), i.e. moving commit logs to the RAID + increasing > concurrent_read and concurrent_write config to 32 and 64. > > The write latencies were very similar, except them being ~3x worse for the > 99.9th percentile and above for scenario (5) above. > The read latencies were also similar, with (3) and (5) being a little worse > for the 99.99th percentile. > > Overall, not making any changes, i.e. (1) performed as well or slightly > better than any of the other changes. > > Running cassandra-stress on both the old and new hardware without making any > config changes, the write performance was very similar, but the new hardware > did show ~10x improvement in the read for the 99.9th percentile and higher. > After thinking about this, the reason why we were not seeing any difference > with our test framework was perhaps the nature of the test where we write the > rows, and then do a bunch of reads to read the rows that were just written > immediately following. The data is read back from the memtables, and never > from the disk/sstables. Hence the new hardware's increased RAM and size of > the disk cache or higher number of disks never helps. > > I'm still not very sure where the current *write* bottleneck is though. The > new hardware has 32 cores vs 8 cores of the old hardware. Moving the commit > log from a dedicated disk to a 7 RAID-0 disk system (where it would be shared > by other data though) didn't make a difference too. (unless the extra > contention on the RAID nullified the positive effects of the RAID). > > Sample iostat data (captured every 10s) for the dedicated disk where commit > logs are written is below. Does this seem like a bottle neck? When the commit > logs are written the await/svctm ratio is high. > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > 0.00 8.09 0.04 8.85 0.00 0.07 15.74 0.00 > 0.12 0.03 0.02 > 0.00 768.03 0.00 9.49 0.00 3.04 655.41 0.04 > 4.52 0.33 0.31 > 0.00 8.10 0.04 8.85 0.00 0.07 15.75 0.00 > 0.12 0.03 0.02 > 0.00 752.65 0.00 10.09 0.00 2.98 604.75 0.03 > 3.00 0.26 0.26 > > Another interesting thing is that the linux disk cache doesn't seem to be > growing in spite of a lot of free memory available. The total disk cache used > reported by 'free' is less than the size of the sstables written with over > 100 GB unused RAM. > Even in production, where we have the older hardware running with 32 GB RAM > for a long time now, looking at 5 hosts in 1 DC, only 2.5 GB to 8 GB was used > for the disk cache. The Cassandra java process uses the 8 GB allocated to it, > and at least 10-15 GB on all the hosts is not used at all. > > Thanks, > Arindam > > From: Aaron Morton [mailto:aa...@thelastpickle.com] > Sent: Wednesday, November 06, 2013 8:34 PM > To: Cassandra User > Subject: Re: Config changes to leverage new hardware > > Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon. > You will make more use of the extra memory moving to 1.2 as it moves bloom > filters and compression data off heap. > > Also grab the TLAB setting from cassandra-env.sh in v1.2 > > As of now, our performance tests (our application specific as well as > cassandra-stress) are not showing any significant difference in the > hardwares, which is a little disheartening, since the new hardware has a lot > more RAM and CPU. > For reads or writes or both ? > > Writes tend to scale with cores as long as the commit log can keep up. > Reads improve with disk IO and page cache size when the hot set is in memory. > > Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 > disk used for commitlog and 3 disks RAID 0 for data) New Hardware: 32 > cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB disks ( 1 disk > used for commitlog and 7 disks RAID 0 for data) Is the disk IO on the commit > log volume keeping up ? > You cranked up the concurrent writers and the commit log may not keep up. You > could put the commit log on the same RAID volume to see if that improves > writes. > > The config we tried modifying so far was concurrent_reads to (16 * > number of drives) and concurrent_writes to (8 * number of cores) as > per > 256 write threads is a lot. Make sure the commit log can keep up, I would put > it back to 32, maybe try 64. Not sure the concurrent list for the commit log > will work well with that many threads. > > May want to put the reads down as well. > > It's easier to tune the system if you can provide some info on the workload. > > Cheers > > ----------------- > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 7/11/2013, at 12:35 pm, Arindam Barua > <aba...@247-inc.com<mailto:aba...@247-inc.com>> wrote: > > > > We want to upgrade our Cassandra cluster to have newer hardware, and were > wondering if anyone has suggestions on Cassandra or linux config changes that > will prove to be beneficial. > As of now, our performance tests (our application specific as well as > cassandra-stress) are not showing any significant difference in the > hardwares, which is a little disheartening, since the new hardware has a lot > more RAM and CPU. > > Old Hardware: 8 cores (2 quad core), 32 GB RAM, four 1-TB disks ( 1 > disk used for commitlog and 3 disks RAID 0 for data) New Hardware: 32 > cores (2 8-core with hyperthreading), 128 GB RAM, eight 1-TB disks ( 1 > disk used for commitlog and 7 disks RAID 0 for data) > > Most of the cassandra config currently is the default, and we are using > LeveledCompaction strategy. Default key cache, row cache turned off. > The config we tried modifying so far was concurrent_reads to (16 * number of > drives) and concurrent_writes to (8 * number of cores) as per recommendation > in cassandra.yaml, but that didn't make much difference. > We were hoping that at least the extra RAM in the new hardware will be used > for Linux file caching and hence an improvement in performance will be > observed. > > Running Cassandra 1.1.5 currently, but evaluating to upgrade to 1.2.11 soon. > > Thanks, > Arindam