Hi Jonathan, The IOs are like below. I am not sure why one node always has a much bigger KB_read/s than other nodes. It seems not good.
============== avg-cpu: %user %nice %system %iowait %steal %idle 54.78 24.48 9.35 0.96 0.08 10.35 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 2.31 14.64 17.95 1415348 1734856 xvdf 252.68 11789.51 6394.15 1139459318 617996710 ============= avg-cpu: %user %nice %system %iowait %steal %idle 22.71 6.57 3.96 0.50 0.19 66.07 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 1.12 3.63 10.59 3993540 11648848 xvdf 68.20 923.51 2526.86 1016095212 2780187819 =============== avg-cpu: %user %nice %system %iowait %steal %idle 22.31 8.08 3.70 0.26 0.23 65.42 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 1.07 2.87 10.89 3153996 11976704 xvdf 34.48 498.21 2293.70 547844196 2522227746 ================ avg-cpu: %user %nice %system %iowait %steal %idle 22.75 8.13 3.82 0.36 0.21 64.73 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 1.10 3.20 11.33 3515752 12442344 xvdf 44.45 474.30 2511.71 520758840 2757732583 On Thu, Jul 7, 2016 at 6:54 PM, Jonathan Haddad <j...@jonhaddad.com> wrote: > What's your CPU looking like? If it's low, check your IO with iostat or > dstat. I know some people have used Ebs and say it's fine but ive been > burned too many times. > On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <y...@kryptoncloud.com> wrote: > >> Hi Riccardo, >> >> Very low IO-wait. About 0.3%. >> No stolen CPU. It is a casssandra only instance. I did not see any >> dropped messages. >> >> >> ubuntu@cassandra1:/mnt/data$ nodetool tpstats >> Pool Name Active Pending Completed Blocked >> All time blocked >> MutationStage 1 1 929509244 0 >> 0 >> ViewMutationStage 0 0 0 0 >> 0 >> ReadStage 4 0 4021570 0 >> 0 >> RequestResponseStage 0 0 731477999 0 >> 0 >> ReadRepairStage 0 0 165603 0 >> 0 >> CounterMutationStage 0 0 0 0 >> 0 >> MiscStage 0 0 0 0 >> 0 >> CompactionExecutor 2 55 92022 0 >> 0 >> MemtableReclaimMemory 0 0 1736 0 >> 0 >> PendingRangeCalculator 0 0 6 0 >> 0 >> GossipStage 0 0 345474 0 >> 0 >> SecondaryIndexManagement 0 0 0 0 >> 0 >> HintsDispatcher 0 0 4 0 >> 0 >> MigrationStage 0 0 35 0 >> 0 >> MemtablePostFlush 0 0 1973 0 >> 0 >> ValidationExecutor 0 0 0 0 >> 0 >> Sampler 0 0 0 0 >> 0 >> MemtableFlushWriter 0 0 1736 0 >> 0 >> InternalResponseStage 0 0 5311 0 >> 0 >> AntiEntropyStage 0 0 0 0 >> 0 >> CacheCleanupExecutor 0 0 0 0 >> 0 >> Native-Transport-Requests 128 128 347508531 2 >> 15891862 >> >> Message type Dropped >> READ 0 >> RANGE_SLICE 0 >> _TRACE 0 >> HINT 0 >> MUTATION 0 >> COUNTER_MUTATION 0 >> BATCH_STORE 0 >> BATCH_REMOVE 0 >> REQUEST_RESPONSE 0 >> PAGED_RANGE 0 >> READ_REPAIR 0 >> >> >> >> >> >> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <ferra...@gmail.com> >> wrote: >> >>> Hi Yuan, >>> >>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside >>> from any Cassandra specific discussion a system load of 10 on a 4 threads >>> machine is way too much in my opinion. If that is the running average >>> system load I would look deeper into system details. Is that IO wait? Is >>> that CPU Stolen? Is that a Cassandra only instance or are there other >>> processes pushing the load? >>> What does your "nodetool tpstats" say? Hoe many dropped messages do you >>> have? >>> >>> Best, >>> >>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <y...@kryptoncloud.com> >>> wrote: >>> >>>> Thanks Ben! For the post, it seems they got a little better but similar >>>> result than i did. Good to know it. >>>> I am not sure if a little fine tuning of heap memory will help or not. >>>> >>>> >>>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <ben.sla...@instaclustr.com> >>>> wrote: >>>> >>>>> Hi Yuan, >>>>> >>>>> You might find this blog post a useful comparison: >>>>> >>>>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/ >>>>> >>>>> Although the focus is on Spark and Cassandra and multi-DC there are >>>>> also some single DC benchmarks of m4.xl clusters plus some discussion of >>>>> how we went about benchmarking. >>>>> >>>>> Cheers >>>>> Ben >>>>> >>>>> >>>>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <y...@kryptoncloud.com> wrote: >>>>> >>>>>> Yes, here is my stress test result: >>>>>> Results: >>>>>> op rate : 12200 [WRITE:12200] >>>>>> partition rate : 12200 [WRITE:12200] >>>>>> row rate : 12200 [WRITE:12200] >>>>>> latency mean : 16.4 [WRITE:16.4] >>>>>> latency median : 7.1 [WRITE:7.1] >>>>>> latency 95th percentile : 38.1 [WRITE:38.1] >>>>>> latency 99th percentile : 204.3 [WRITE:204.3] >>>>>> latency 99.9th percentile : 465.9 [WRITE:465.9] >>>>>> latency max : 1408.4 [WRITE:1408.4] >>>>>> Total partitions : 1000000 [WRITE:1000000] >>>>>> Total errors : 0 [WRITE:0] >>>>>> total gc count : 0 >>>>>> total gc mb : 0 >>>>>> total gc time (s) : 0 >>>>>> avg gc time(ms) : NaN >>>>>> stdev gc time(ms) : 0 >>>>>> Total operation time : 00:01:21 >>>>>> END >>>>>> >>>>>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <r...@foundev.pro> wrote: >>>>>> >>>>>>> Lots of variables you're leaving out. >>>>>>> >>>>>>> Depends on write size, if you're using logged batch or not, what >>>>>>> consistency level, what RF, if the writes come in bursts, etc, etc. >>>>>>> However, that's all sort of moot for determining "normal" really you >>>>>>> need a >>>>>>> baseline as all those variables end up mattering a huge amount. >>>>>>> >>>>>>> I would suggest using Cassandra stress as a baseline and go from >>>>>>> there depending on what those numbers say (just pick the defaults). >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <y...@kryptoncloud.com> wrote: >>>>>>> >>>>>>> yes, it is about 8k writes per node. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle < >>>>>>> daeme...@gmail.com> wrote: >>>>>>> >>>>>>>> Are you saying 7k writes per node? or 30k writes per node? >>>>>>>> >>>>>>>> >>>>>>>> *.......* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 >>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>* >>>>>>>> >>>>>>>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <y...@kryptoncloud.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> writes 30k/second is the main thing. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle < >>>>>>>>> daeme...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Assuming you meant 100k, that likely for something with 16mb of >>>>>>>>>> storage (probably way small) where the data is more that 64k hence >>>>>>>>>> will not >>>>>>>>>> fit into the row cache. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *.......* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 >>>>>>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 >>>>>>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>* >>>>>>>>>> >>>>>>>>>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <y...@kryptoncloud.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory >>>>>>>>>>> and 600GB ssd EBS). >>>>>>>>>>> I can reach a cluster wide write requests of 30k/second and read >>>>>>>>>>> request about 100/second. The cluster OS load constantly above 10. >>>>>>>>>>> Are >>>>>>>>>>> those normal? >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Yuan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>> ———————— >>>>> Ben Slater >>>>> Chief Product Officer >>>>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >>>>> +61 437 929 798 >>>>> >>>> >>>> >>> >>