I will start – knowing that others will have additional help/questions. What heap size are you using? Sounds like you are using the CMS garbage collector. That takes some arcane knowledge and lots of testing to tune. I would start with G1 and using ½ the available RAM as the heap size. I would want 32 GB RAM as a minimum on the hosts.
Spinning disks are a problem, too. Can you tell if the IO is getting overwhelmed? SSDs are much preferred. Read before write is usually an anti-pattern for Cassandra. From your queries, it seems you have a partition key and clustering key. Can you give us the table schema? I’m also concerned about the IF EXISTS in your delete. I think that invokes a light weight transaction – costly for performance. Is it really required for your use case? Sean Durity From: Marco Gasparini <marco.gaspar...@competitoor.com> Sent: Friday, January 11, 2019 8:20 AM To: user@cassandra.apache.org Subject: [EXTERNAL] fine tuning for wide rows and mixed worload system Hello everyone, I need some advise in order to solve my use case problem. I have already tried some solutions but it didn't work out. Can you help me with the following configuration please? any help is very appreciate I'm using: - Cassandra 3.11.3 - java version "1.8.0_191" My use case is composed by the following constraints: - about 1M reads per day (it is going to rise up) - about 2M writes per day (it is going to rise up) - there is a high peek of requests in less than 2 hours in which the system receives half of all day traffic (500K reads, 1M writes) - each request is composed by 1 read and 2 writes (1 delete + 1 write) * the read query selects max 3 records based on the primary key (select * from my_keyspace.my_table where pkey = ? limit 3) * then is performed a deletion of one record (delete from my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS) * finally the new data is stored (insert into my_keyspace.my_table (event_datetime, pkey, agent, some_id, ft, ftt..) values (?,?,?,?,?,?...)) - each row is pretty wide. I don't really know the exact size because there are 2 dynamic text columns that stores data between 1MB to 50MB length each. So, reads are going to be huge because I read 3 records of that dimension every time. Writes are complex as well because each row is that wide. Currently, I own 3 nodes with the following properties: - node1: * Intel Core i7-3770 * 2x HDD SATA 3,0 TB * 4x RAM 8192 MB DDR3 * nominative bit rate 175MB/s # blockdev --report /dev/sd[ab] RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 3000592982016 /dev/sda rw 256 512 4096 0 3000592982016 /dev/sdb - node2,3: * Intel Core i7-2600 * 2x HDD SATA 3,0 TB * 4x RAM 4096 MB DDR3 * nominative bit rate 155MB/s # blockdev --report /dev/sd[ab] RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 3000592982016 /dev/sda rw 256 512 4096 0 3000592982016 /dev/sdb Each node has 2 disks but I have disabled RAID option and I have created a virtual single disk in order to get much free space. Can this configuration create issues? I have already tried some configurations in order to make it work, like: 1) straigthforward attempt - default Cassandra configuration (cassandra.yaml) - RF=1 - SizeTieredCompactionStrategy (write strategy) - no row cache (because of wide rows dimension is better to have no row cache) - gc_grace_seconds = 1 day (unfortunately, I did no repair schedule at all) results: too many timeouts, losing data 2) - added repair schedules - RF=3 (in order increase reads speed) results: - too many timeouts, losing data - high I/O consumption on each nodes (iostat shows 100% in %util on each nodes, dstat shows hundred of M read for each iteration) - node2 frozen until I stopped data writes. - node3 almost frozen - many panding MutationStage events in TPSTATS in node2 - many full GC - many HintsDispatchExecutor events in system.log actual) - added repair schedules - RF=3 - set durable_writes = false in order to speed up writes - increased young heap - decreased SurviviorRatio in order to get much young size available because of wide rows data - increased from 1 to 3 MaxTenuringThreshold in order to decrease reads latency - increased Cassandra's memtable onheap and offheap dimensions beacause of wide rows data - changed memtable_allocation_type to offheap_objects bacause of wide rows data results: - better GC performance on nodes1 and node3 - still high I/O consumption on each nodes (iostat shows 100% in %util on each nodes, dstat shows hundred of M read for each iteration) - still node2 completely frozen - many panding MutationStage events in TPSTATS in node2 - many HintsDispatchExecutor events in system.log in each nodes I cannot go to AWS but I can only get dedicated server. Do you have any suggestions to fine tune the system on this use case? Thank you Marco ________________________________ The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.