RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

Durity, Sean R Fri, 11 Jan 2019 07:13:53 -0800

I will start – knowing that others will have additional help/questions.

What heap size are you using? Sounds like you are using the CMS garbage 
collector. That takes some arcane knowledge and lots of testing to tune. I 
would start with G1 and using ½ the available RAM as the heap size. I would 
want 32 GB RAM as a minimum on the hosts.


Spinning disks are a problem, too. Can you tell if the IO is getting 
overwhelmed? SSDs are much preferred.

Read before write is usually an anti-pattern for Cassandra. From your queries, 
it seems you have a partition key and clustering key. Can you give us the table 
schema? I’m also concerned about the IF EXISTS in your delete. I think that 
invokes a light weight transaction – costly for performance. Is it really 
required for your use case?


Sean Durity

From: Marco Gasparini <marco.gaspar...@competitoor.com>
Sent: Friday, January 11, 2019 8:20 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] fine tuning for wide rows and mixed worload system

Hello everyone,

I need some advise in order to solve my use case problem. I have already tried 
some solutions but it didn't work out.
Can you help me with the following configuration please? any help is very 
appreciate

I'm using:
- Cassandra 3.11.3
- java version "1.8.0_191"

My use case is composed by the following constraints:
- about 1M reads per day (it is going to rise up)
- about 2M writes per day (it is going to rise up)
- there is a high peek of requests in less than 2 hours in which the system 
receives half of all day traffic (500K reads, 1M writes)
- each request is composed by 1 read and 2 writes (1 delete + 1 write)

            * the read query selects max 3 records based on the primary key 
(select * from my_keyspace.my_table where pkey = ? limit 3)
            * then is performed a deletion of one record (delete from 
my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS)
            * finally the new data is stored (insert into my_keyspace.my_table 
(event_datetime, pkey, agent, some_id, ft, ftt..) values (?,?,?,?,?,?...))

- each row is pretty wide. I don't really know the exact size because there are 
2 dynamic text columns that stores data between 1MB to 50MB length each.
  So, reads are going to be huge because I read 3 records of that dimension 
every time. Writes are complex as well because each row is that wide.

Currently, I own 3 nodes with the following properties:
- node1:
            * Intel Core i7-3770
            * 2x HDD SATA 3,0 TB
            * 4x RAM 8192 MB DDR3
            * nominative bit rate 175MB/s
            # blockdev --report /dev/sd[ab]
                        RO    RA   SSZ   BSZ   StartSec            Size   Device
                        rw   256   512  4096          0   3000592982016   
/dev/sda
                        rw   256   512  4096          0   3000592982016   
/dev/sdb

- node2,3:
            * Intel Core i7-2600
            * 2x HDD SATA 3,0 TB
            * 4x RAM 4096 MB DDR3
            * nominative bit rate 155MB/s
            # blockdev --report /dev/sd[ab]
                        RO    RA   SSZ   BSZ   StartSec            Size   Device
                        rw   256   512  4096          0   3000592982016   
/dev/sda
                        rw   256   512  4096          0   3000592982016   
/dev/sdb

Each node has 2 disks but I have disabled RAID option and I have created a 
virtual single disk in order to get much free space.
Can this configuration create issues?

I have already tried some configurations in order to make it work, like:
1) straigthforward attempt
            - default Cassandra configuration (cassandra.yaml)
            - RF=1
            - SizeTieredCompactionStrategy  (write strategy)
            - no row cache (because of wide rows dimension is better to have no 
row cache)
            - gc_grace_seconds = 1 day (unfortunately, I did no repair schedule 
at all)
            results:
                        too many timeouts, losing data

2)
            - added repair schedules
            - RF=3 (in order increase reads speed)
            results:
                        - too many timeouts, losing data
                        - high I/O consumption on each nodes (iostat shows 100% 
in %util on each nodes, dstat shows hundred of M read for each iteration)
                        - node2 frozen until I stopped data writes.
                        - node3 almost frozen
                        - many panding MutationStage events in TPSTATS in node2
                        - many full GC
                        - many HintsDispatchExecutor events in system.log

actual)
            - added repair schedules
            - RF=3
            - set durable_writes = false in order to speed up writes
            - increased young heap
            - decreased SurviviorRatio in order to get much young size 
available because of wide rows data
            - increased from 1 to 3 MaxTenuringThreshold in order to decrease 
reads latency
            - increased Cassandra's memtable onheap and offheap dimensions 
beacause of wide rows data
            - changed memtable_allocation_type to offheap_objects bacause of 
wide rows data
            results:
                        - better GC performance on nodes1 and node3
                        - still high I/O consumption on each nodes (iostat 
shows 100% in %util on each nodes, dstat shows hundred of M read for each 
iteration)
                        - still node2 completely frozen
                        - many panding MutationStage events in TPSTATS in node2
                        - many HintsDispatchExecutor events in system.log in 
each nodes


I cannot go to AWS but I can only get dedicated server.
Do you have any suggestions to fine tune the system on this use case?

Thank you
Marco


________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

Reply via email to