[ 
https://issues.apache.org/jira/browse/CASSANDRA-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-7217:
--------------------------------------
    Attachment: 2000-threads.svg
                500-threads.svg

Attempt at flame graphs. {{perf record}} was kind of unhappy with 2k thread and 
demolished a core and throughput went down a nice chunk.

Also have more counters. Interestingly there are 3x as many system calls with 
2k threads, but I wasn't counting individual system calls so I don't know what 
that time was spent doing.

{code}
Results:
op rate                   : 51730 [WRITE:51730]
partition rate            : 51730 [WRITE:51730]
row rate                  : 51730 [WRITE:51730]
latency mean              : 9.6 [WRITE:9.6]
latency median            : 7.3 [WRITE:7.3]
latency 95th percentile   : 19.7 [WRITE:19.7]
latency 99th percentile   : 39.2 [WRITE:39.2]
latency 99.9th percentile : 79.0 [WRITE:79.0]
latency max               : 1119.0 [WRITE:1119.0]
Total partitions          : 19000000 [WRITE:19000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:06:07
END

 Performance counter stats for './cassandra-stress write n=19000000 -rate 
threads=500 -mode native cql3 -node 192.168.1.9':

 2,515,431,304,667      cycles                    #    2.441 GHz                
     [16.05%]
 2,187,585,695,677      instructions              #    0.87  insns per cycle    
    
                                                  #    0.79  stalled cycles per 
insn [21.35%]
    41,594,524,794      cache-references          #   40.359 M/sec              
     [26.62%]
    10,229,632,567      cache-misses              #   24.594 % of all cache 
refs     [31.87%]
 1,731,248,021,308      idle-cycles-frontend      #   68.83% frontend cycles 
idle    [37.10%]
 1,171,017,181,584      idle-cycles-backend       #   46.55% backend  cycles 
idle    [42.33%]
    1030610.942568      cpu-clock (msec)                                        
    
    1030611.990849      task-clock (msec)         #    2.721 CPUs utilized      
    
            91,892      page-faults               #    0.089 K/sec              
    
        32,039,387      cs                        #    0.031 M/sec              
    
         1,584,549      migrations                #    0.002 M/sec              
    
            91,537      minor-faults              #    0.089 K/sec              
    
   271,307,683,357      branch-instructions       #  263.249 M/sec              
     [47.56%]
     2,777,454,782      branch-misses             #    1.02% of all branches    
     [47.48%]
    52,738,483,134      L1-dcache-load-misses     #   51.172 M/sec              
     [47.47%]
    18,283,685,960      L1-dcache-store-misses    #   17.741 M/sec              
     [47.47%]
    15,205,838,312      L1-dcache-prefetch-misses #   14.754 M/sec              
     [42.21%]
    39,417,616,920      L1-icache-load-misses     #   38.247 M/sec              
     [42.28%]
     2,630,018,507      dTLB-load-misses          #    2.552 M/sec              
     [42.33%]
       480,675,134      dTLB-store-misses         #    0.466 M/sec              
     [42.38%]
       567,242,554      iTLB-load-misses          #    0.550 M/sec              
     [42.43%]
     2,779,803,804      branch-load-misses        #    2.697 M/sec              
     [42.48%]
    23,639,238,796      LLC-loads                 #   22.937 M/sec              
     [42.51%]
     4,117,222,926      LLC-stores                #    3.995 M/sec              
     [42.51%]
    11,311,004,882      LLC-prefetches            #   10.975 M/sec              
     [10.60%]
        46,798,741      sched:sched_wakeup        #    0.045 M/sec              
    
             1,284      sched:sched_wakeup_new    #    0.001 K/sec              
    
        32,039,387      sched:sched_switch        #    0.031 M/sec              
    
         1,585,242      sched:sched_migrate_task  #    0.002 M/sec              
    
               643      sched:sched_process_exit  #    0.001 K/sec              
    
                 0      sched:sched_wait_task     #    0.000 K/sec              
    
                 4      sched:sched_process_wait  #    0.000 K/sec              
    
               642      sched:sched_process_fork  #    0.001 K/sec              
    
                 5      sched:sched_process_exec  #    0.000 K/sec              
    
 9,609,960,131,198      sched:sched_stat_wait     # 9324.518 M/sec              
    
388,048,782,246,881      sched:sched_stat_sleep    # 376522.674 M/sec           
       
        40,039,320      sched:sched_stat_iowait   #    0.039 M/sec              
    
     8,678,041,742      sched:sched_stat_blocked  #    8.420 M/sec              
    
 1,032,643,169,845      sched:sched_stat_runtime  # 1001.971 M/sec              
    
           272,903      sched:sched_wake_idle_without_ipi #    0.265 K/sec      
            
             2,269      workqueue:workqueue_queue_work #    0.002 K/sec         
         
             2,269      workqueue:workqueue_activate_work #    0.002 K/sec      
            
         2,185,692      irq:irq_handler_entry     #    0.002 M/sec              
    
         2,185,692      irq:irq_handler_exit      #    0.002 M/sec              
    
         2,809,612      irq:softirq_entry         #    0.003 M/sec              
    
         2,809,612      irq:softirq_exit          #    0.003 M/sec              
    
         2,834,592      irq:softirq_raise         #    0.003 M/sec              
    
               642      task:task_newtask         #    0.001 K/sec              
    
                 5      task:task_rename          #    0.000 K/sec              
    
            84,774      exceptions:page_fault_user #    0.082 K/sec             
     
             7,118      exceptions:page_fault_kernel #    0.007 K/sec           
       
        84,530,358      raw_syscalls:sys_enter    #    0.082 M/sec              
    
        84,530,372      raw_syscalls:sys_exit     #    0.082 M/sec              
    

     378.751202925 seconds time elapsed


Results:
op rate                   : 12919 [WRITE:12919]
partition rate            : 12919 [WRITE:12919]
row rate                  : 12919 [WRITE:12919]
latency mean              : 154.9 [WRITE:154.9]
latency median            : 83.2 [WRITE:83.2]
latency 95th percentile   : 366.2 [WRITE:366.2]
latency 99th percentile   : 579.3 [WRITE:579.3]
latency 99.9th percentile : 872.4 [WRITE:872.4]
latency max               : 2396.9 [WRITE:2396.9]
Total partitions          : 19000000 [WRITE:19000000]
Total errors              : 0 [WRITE:0]
total gc count            : 0
total gc mb               : 0
total gc time (s)         : 0
avg gc time(ms)           : NaN
stdev gc time(ms)         : 0
Total operation time      : 00:24:30
END

 Performance counter stats for './cassandra-stress write n=19000000 -rate 
threads=2000 -mode native cql3 -node 192.168.1.9':

 5,181,129,425,325      cycles                    #    2.169 GHz                
     [16.10%]
 3,612,561,477,198      instructions              #    0.70  insns per cycle    
    
                                                  #    1.06  stalled cycles per 
insn [21.43%]
   125,303,440,058      cache-references          #   52.445 M/sec              
     [26.76%]
    31,162,640,970      cache-misses              #   24.870 % of all cache 
refs     [32.03%]
 3,846,338,739,881      idle-cycles-frontend      #   74.24% frontend cycles 
idle    [37.28%]
 2,862,782,268,966      idle-cycles-backend       #   55.25% backend  cycles 
idle    [42.49%]
    2387975.425875      cpu-clock (msec)                                        
    
    2389248.103454      task-clock (msec)         #    1.610 CPUs utilized      
    
           159,955      page-faults               #    0.067 K/sec              
    
        79,702,888      cs                        #    0.033 M/sec              
    
        44,298,673      migrations                #    0.019 M/sec              
    
           159,869      minor-faults              #    0.067 K/sec              
    
   641,996,673,767      branch-instructions       #  268.702 M/sec              
     [47.76%]
     6,698,884,506      branch-misses             #    1.04% of all branches    
     [47.72%]
   140,604,450,350      L1-dcache-load-misses     #   58.849 M/sec              
     [47.69%]
    37,276,664,029      L1-dcache-store-misses    #   15.602 M/sec              
     [47.72%]
    30,237,145,992      L1-dcache-prefetch-misses #   12.656 M/sec              
     [42.39%]
    69,945,237,261      L1-icache-load-misses     #   29.275 M/sec              
     [42.47%]
     5,669,464,361      dTLB-load-misses          #    2.373 M/sec              
     [42.51%]
     1,086,499,138      dTLB-store-misses         #    0.455 M/sec              
     [42.55%]
     1,042,804,417      iTLB-load-misses          #    0.436 M/sec              
     [42.52%]
     6,685,073,730      branch-load-misses        #    2.798 M/sec              
     [42.53%]
    87,031,289,561      LLC-loads                 #   36.426 M/sec              
     [42.55%]
    11,988,446,940      LLC-stores                #    5.018 M/sec              
     [42.46%]
    29,826,259,945      LLC-prefetches            #   12.484 M/sec              
     [10.58%]
       154,542,169      sched:sched_wakeup        #    0.065 M/sec              
    
             4,284      sched:sched_wakeup_new    #    0.002 K/sec              
    
        79,702,888      sched:sched_switch        #    0.033 M/sec              
    
        44,301,003      sched:sched_migrate_task  #    0.019 M/sec              
    
             2,143      sched:sched_process_exit  #    0.001 K/sec              
    
                 0      sched:sched_wait_task     #    0.000 K/sec              
    
                 4      sched:sched_process_wait  #    0.000 K/sec              
    
             2,142      sched:sched_process_fork  #    0.001 K/sec              
    
                 5      sched:sched_process_exec  #    0.000 K/sec              
    
   448,770,334,535      sched:sched_stat_wait     #  187.829 M/sec              
    
5,998,250,681,675,102      sched:sched_stat_sleep    # 2510518.130 M/sec        
          
       154,752,649      sched:sched_stat_iowait   #    0.065 M/sec              
    
    49,347,196,951      sched:sched_stat_blocked  #   20.654 M/sec              
    
 3,155,194,145,918      sched:sched_stat_runtime  # 1320.580 M/sec              
    
        71,084,628      sched:sched_wake_idle_without_ipi #    0.030 M/sec      
            
             6,089      workqueue:workqueue_queue_work #    0.003 K/sec         
         
             6,089      workqueue:workqueue_activate_work #    0.003 K/sec      
            
           981,847      irq:irq_handler_entry     #    0.411 K/sec              
    
           981,847      irq:irq_handler_exit      #    0.411 K/sec              
    
         2,690,966      irq:softirq_entry         #    0.001 M/sec              
    
         2,690,966      irq:softirq_exit          #    0.001 M/sec              
    
         2,701,243      irq:softirq_raise         #    0.001 M/sec              
    
             2,142      task:task_newtask         #    0.001 K/sec              
    
                 5      task:task_rename          #    0.000 K/sec              
    
           152,187      exceptions:page_fault_user #    0.064 K/sec             
     
             7,768      exceptions:page_fault_kernel #    0.003 K/sec           
       
       232,031,226      raw_syscalls:sys_enter    #    0.097 M/sec              
    
       232,031,239      raw_syscalls:sys_exit     #    0.097 M/sec              
    

    1483.891153703 seconds time elapsed
{code}

> Native transport performance (with cassandra-stress) drops precipitously past 
> around 1000 threads
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7217
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7217
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance, stress, triaged
>             Fix For: 3.1
>
>         Attachments: 2000-threads.svg, 500-threads.svg
>
>
> This is obviously bad. Let's figure out why it's happening and put a stop to 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to