Kevin Rivait created CASSANDRA-13780:
----------------------------------------

             Summary: ADD Node streaming throughput performance
                 Key: CASSANDRA-13780
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13780
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
         Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 
11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2199.869
BogoMIPS:              4399.36
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39

             total       used       free     shared    buffers     cached
Mem:          252G       217G        34G       708K       308M       149G
-/+ buffers/cache:        67G       185G
Swap:          16G         0B        16G



            Reporter: Kevin Rivait
            Priority: Blocker
             Fix For: 3.0.9


Problem: Adding a new node to a large cluster runs at least 1000x slower than 
what the network and node hardware capacity can support, taking several days 
per new node.  Adjusting stream throughput and other YAML parameters seems to 
have no effect on performance.  Essentially, it appears that Cassandra has an 
architecture scalability growth problem when adding new nodes to a moderate to 
high data ingestion cluster because Cassandra cannot add new node capacity fast 
enough to keep up with increasing data ingestion volumes and growth.

Initial Configuration: 
Running 3.0.9 and have implemented TWCS on one of our largest table.
Largest table partitioned on (ID, YYYYMM)  using 1 day buckets with a TTL of 60 
days.
Next release will change partitioning to (ID, YYYYMMDD) so that partitions are 
aligned with daily TWCS buckets.
Each node is currently creating roughly a 30GB SSTable per day.
TWCS working as expected,  daily SSTables are dropping off daily after 70 days 
( 60 + 10 day grace)
Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , 
replication factor 3
Data directories are backed with 4 - 2TB SSDs on each node  and a 1 800GB SSD 
for commit logs.
Requirement is to double cluster size, capacity, and ingestion volume within a 
few weeks.

Observed Behavior:
1. streaming throughput during add node – we observed maximum 6 Mb/s streaming 
from each of the 14 nodes on a 20Gb/s switched network, taking at least 106 
hours for each node to join cluster and each node is only about 2.2 TB is size.

2. compaction on the newly added node - compaction has fallen behind, with 
anywhere from 4,000 to 10,000 SSTables at any given time.  It took 3 weeks for 
compaction to finish on each newly added node.   Increasing number of 
compaction threads to match number of CPU (40)  and increasing compaction 
throughput to 32MB/s seemed to be the sweet spot. 

3. TWCS buckets on new node, data streamed to this node over 4 1/2 days.  
Compaction correctly placed the data in daily files, but the problem is the 
file dates reflect when compaction created the file and not the date of the 
last record written in the TWCS bucket, which will cause the files to remain 
around much longer than necessary.  

Two Questions:
1. What can be done to substantially improve the performance of adding a new 
node?
2. Can compaction on TWCS partitions for newly added nodes change the file 
create date to match the highest date record in the file -or- add another piece 
of meta-data to the TWCS files that reflect the file drop date so that TWCS 
partitions can be dropped consistently?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to