Re: High disk io read load

2017-02-24 Thread Benjamin Roth
It was only the schema change.

2017-02-24 19:18 GMT+01:00 kurt greaves :

> How many CFs are we talking about here? Also, did the script also kick off
> the scrubs or was this purely from changing the schemas?
> ​
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-24 Thread kurt greaves
How many CFs are we talking about here? Also, did the script also kick off
the scrubs or was this purely from changing the schemas?
​


Re: High disk io read load

2017-02-20 Thread Benjamin Roth
Hah! Found the problem!

After setting read_ahead to 0 and compression chunk size to 4kb on all CFs,
the situation was PERFECT (nearly, please see below)! I scrubbed some CFs
but not the whole dataset, yet. I knew it was not too few RAM.

Some stats:
- Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
- Disk throughput: https://cl.ly/2a0Z250S1M3c
- Dstat: https://gist.github.com/brstgt/c92bbd46ab76283e534b853b88ad3b26
- This shows, that the request distribution remained the same, so no
dyn-snitch magic: https://cl.ly/3E0t1T1z2c0J

Btw. I stumbled across this one:
https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
Maybe we should also think about lowering default chunk length.

*Unfortunately schema changes had a disturbing effect:*
- I changed the chunk size with a script, so there were a lot of schema
changes in a small period.
- After all tables were changed, one of the seed hosts (cas1) went TOTALLY
crazy.
- Latency on this host was 10x of all other hosts.
- There were more ParNew GCs.
- Load was very high (up to 80, 100% CPU)
- Whole system was unstable due to unpredictable latencies and
backpressures (https://cl.ly/1m022g2W1Q3d)
- Even SELECT * FROM system_schema.table etc appeared as slow query in the
logs
- It was the 1st server in the connect host list for the PHP client
- CS restart didn't help. Reboot did not help (cold page cache made it
probably worse).
- All other nodes were totally ok.
- Stopping CS on cas1 helped to keep the system stable. Brought down
latency again, but was no solution.

=> Only replacing the node (with a newer, faster node) in the connect-host
list helped that situation.

Any ideas why changing schemas and/or chunk size could have such an effect?
For some time the situation was really critical.


2017-02-20 10:48 GMT+01:00 Bhuvan Rawal :

> Hi Benjamin,
>
> Yes, Read ahead of 8 would imply more IO count from disk but it should not
> cause more data read off the disk as is happening in your case.
>
> One probable reason for high disk io would be because the 512 vnode has
> less page to RAM ratio of 22% (100G buff /437G data) as compared to 46%
> (100G/237G). And as your avg record size is in bytes for every disk io you
> are fetching complete 64K block to get a row.
>
> Perhaps you can balance the node by adding equivalent RAM ?
>
> Regards,
> Bhuvan
>


Re: High disk io read load

2017-02-20 Thread Bhuvan Rawal
Hi Benjamin,

Yes, Read ahead of 8 would imply more IO count from disk but it should not
cause more data read off the disk as is happening in your case.

One probable reason for high disk io would be because the 512 vnode has
less page to RAM ratio of 22% (100G buff /437G data) as compared to 46%
(100G/237G). And as your avg record size is in bytes for every disk io you
are fetching complete 64K block to get a row.

Perhaps you can balance the node by adding equivalent RAM ?

Regards,
Bhuvan

On Mon, Feb 20, 2017 at 12:11 AM, Benjamin Roth 
wrote:

> This is the output of sar: https://gist.github.com/anonymous/
> 9545fb69fbb28a20dc99b2ea5e14f4cd
> 
>
> It seems to me that there es not enough page cache to handle all data in a
> reasonable way.
> As pointed out yesterday, the read rate with empty page cache is ~800MB/s.
> Thats really (!!!) much for 4-5MB/s network output.
>
> I stumbled across the compression chunk size, which I always left
> untouched from the default of 64kb (https://cl.ly/2w0V3U1q1I1Y). I guess
> setting a read ahead of 8kb is totally pointless if CS reads 64kb if it
> only has to fetch a single row, right? Are there recommendations for that
> setting?
>
> 2017-02-19 19:15 GMT+01:00 Bhuvan Rawal :
>
>> Hi Edward,
>>
>> This could have been a valid case here but if hotspots indeed existed
>> then along with really high disk io , the node should have been doing
>> proportionate high network io as well. -  higher queries per second as well.
>>
>> But from the output shared by Benjamin that doesnt appear to be the case
>> and things look balanced.
>>
>> Regards,
>>
>> On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo 
>> wrote:
>>
>>>
>>>
>>> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth 
>>> wrote:
>>>
 We are talking about a read IO increase of over 2000% with 512 tokens
 compared to 256 tokens. 100% increase would be linear which would be
 perfect. 200% would even okay, taking the RAM/Load ratio for caching into
 account. But > 20x the read IO is really incredible.
 The nodes are configured with puppet, they share the same roles and no
 manual "optimizations" are applied. So I can't imagine, a different
 configuration is responsible for it.

 2017-02-18 21:28 GMT+01:00 Benjamin Roth :

> This is status of the largest KS of these both nodes:
> UN  10.23.71.10  437.91 GiB  512  49.1%
> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
> UN  10.23.71.9   246.99 GiB  256  28.3%
> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>
> So roughly as expected.
>
> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>
>> what's the Owns % for the relevant keyspace from nodetool status?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
 <+49%207161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>> When I read articles like this:
>>>
>>> http://www.doanduyhai.com/blog/?p=1930
>>>
>>> And see the word hot-spot.
>>>
>>> "Another performance consideration worth mentioning is hot-spot.
>>> Similar to manual denormalization, if your view partition key is chosen
>>> poorly, you’ll end up with hot spots in your cluster. A simple example with
>>> our *user* table is to create a materialized
>>>
>>> *view user_by_gender"It leads me to ask a question back: What can you
>>> say about hotspots in your data? Even if your nodes had the identical
>>> number of tokens this autho seems to suggesting that you still could have
>>> hotspots. Maybe the issue is you have a hotspot 2x hotspots, or your
>>> application has a hotspot that would be present even with perfect token
>>> balancing.*
>>>
>>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: High disk io read load

2017-02-19 Thread Benjamin Roth
This is the output of sar:
https://gist.github.com/anonymous/9545fb69fbb28a20dc99b2ea5e14f4cd


It seems to me that there es not enough page cache to handle all data in a
reasonable way.
As pointed out yesterday, the read rate with empty page cache is ~800MB/s.
Thats really (!!!) much for 4-5MB/s network output.

I stumbled across the compression chunk size, which I always left untouched
from the default of 64kb (https://cl.ly/2w0V3U1q1I1Y). I guess setting a
read ahead of 8kb is totally pointless if CS reads 64kb if it only has to
fetch a single row, right? Are there recommendations for that setting?

2017-02-19 19:15 GMT+01:00 Bhuvan Rawal :

> Hi Edward,
>
> This could have been a valid case here but if hotspots indeed existed then
> along with really high disk io , the node should have been doing
> proportionate high network io as well. -  higher queries per second as well.
>
> But from the output shared by Benjamin that doesnt appear to be the case
> and things look balanced.
>
> Regards,
>
> On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo 
> wrote:
>
>>
>>
>> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth 
>> wrote:
>>
>>> We are talking about a read IO increase of over 2000% with 512 tokens
>>> compared to 256 tokens. 100% increase would be linear which would be
>>> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
>>> account. But > 20x the read IO is really incredible.
>>> The nodes are configured with puppet, they share the same roles and no
>>> manual "optimizations" are applied. So I can't imagine, a different
>>> configuration is responsible for it.
>>>
>>> 2017-02-18 21:28 GMT+01:00 Benjamin Roth :
>>>
 This is status of the largest KS of these both nodes:
 UN  10.23.71.10  437.91 GiB  512  49.1%
 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
 UN  10.23.71.9   246.99 GiB  256  28.3%
 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1

 So roughly as expected.

 2017-02-17 23:07 GMT+01:00 kurt greaves :

> what's the Owns % for the relevant keyspace from nodetool status?
>



 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
 <07161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>> When I read articles like this:
>>
>> http://www.doanduyhai.com/blog/?p=1930
>>
>> And see the word hot-spot.
>>
>> "Another performance consideration worth mentioning is hot-spot. Similar
>> to manual denormalization, if your view partition key is chosen poorly,
>> you’ll end up with hot spots in your cluster. A simple example with our
>> *user* table is to create a materialized
>>
>> *view user_by_gender"It leads me to ask a question back: What can you say
>> about hotspots in your data? Even if your nodes had the identical number of
>> tokens this autho seems to suggesting that you still could have hotspots.
>> Maybe the issue is you have a hotspot 2x hotspots, or your application has
>> a hotspot that would be present even with perfect token balancing.*
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-19 Thread Bhuvan Rawal
Hi Edward,

This could have been a valid case here but if hotspots indeed existed then
along with really high disk io , the node should have been doing
proportionate high network io as well. -  higher queries per second as well.

But from the output shared by Benjamin that doesnt appear to be the case
and things look balanced.

Regards,

On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo 
wrote:

>
>
> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth 
> wrote:
>
>> We are talking about a read IO increase of over 2000% with 512 tokens
>> compared to 256 tokens. 100% increase would be linear which would be
>> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
>> account. But > 20x the read IO is really incredible.
>> The nodes are configured with puppet, they share the same roles and no
>> manual "optimizations" are applied. So I can't imagine, a different
>> configuration is responsible for it.
>>
>> 2017-02-18 21:28 GMT+01:00 Benjamin Roth :
>>
>>> This is status of the largest KS of these both nodes:
>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>
>>> So roughly as expected.
>>>
>>> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>>>
 what's the Owns % for the relevant keyspace from nodetool status?

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>> <+49%207161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
> When I read articles like this:
>
> http://www.doanduyhai.com/blog/?p=1930
>
> And see the word hot-spot.
>
> "Another performance consideration worth mentioning is hot-spot. Similar
> to manual denormalization, if your view partition key is chosen poorly,
> you’ll end up with hot spots in your cluster. A simple example with our
> *user* table is to create a materialized
>
> *view user_by_gender"It leads me to ask a question back: What can you say
> about hotspots in your data? Even if your nodes had the identical number of
> tokens this autho seems to suggesting that you still could have hotspots.
> Maybe the issue is you have a hotspot 2x hotspots, or your application has
> a hotspot that would be present even with perfect token balancing.*
>
>


Re: High disk io read load

2017-02-19 Thread Edward Capriolo
On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth 
wrote:

> We are talking about a read IO increase of over 2000% with 512 tokens
> compared to 256 tokens. 100% increase would be linear which would be
> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
> account. But > 20x the read IO is really incredible.
> The nodes are configured with puppet, they share the same roles and no
> manual "optimizations" are applied. So I can't imagine, a different
> configuration is responsible for it.
>
> 2017-02-18 21:28 GMT+01:00 Benjamin Roth :
>
>> This is status of the largest KS of these both nodes:
>> UN  10.23.71.10  437.91 GiB  512  49.1%
>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>> UN  10.23.71.9   246.99 GiB  256  28.3%
>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>
>> So roughly as expected.
>>
>> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>>
>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

When I read articles like this:

http://www.doanduyhai.com/blog/?p=1930

And see the word hot-spot.

"Another performance consideration worth mentioning is hot-spot. Similar to
manual denormalization, if your view partition key is chosen poorly, you’ll
end up with hot spots in your cluster. A simple example with our *user* table
is to create a materialized

*view user_by_gender"It leads me to ask a question back: What can you say
about hotspots in your data? Even if your nodes had the identical number of
tokens this autho seems to suggesting that you still could have hotspots.
Maybe the issue is you have a hotspot 2x hotspots, or your application has
a hotspot that would be present even with perfect token balancing.*


Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
This looks fine, 8k read ahead as you mentioned.
Doesnt look like an issue of data model as well since reads in this
https://cl.ly/2c3Z1u2k0u2I appear balanced.

In most possibility, this looks like an issue with new node configuration
to me. The fact that you have really less data going out of node rules out
the possibility of More "hot" data than can be cached. Are your nodes
running spark jobs in locality which are filtering data locally and sending
limited data out?

Im finding 800M Disk IO for 4M network transfer a really fishy!

I believe as a starting point, you can try and debugging page faults with:
*sar -B 1 10*

Regards*,*

On Sun, Feb 19, 2017 at 2:57 AM, Benjamin Roth 
wrote:

> Just for the record, that's what dstat looks like while CS is starting:
>
> root@cas10:~# dstat -lrnv 10
> ---load-avg--- --io/total- -net/total- ---procs--- --memory-usage-
> ---paging-- -dsk/total- ---system-- total-cpu-usage
>  1m   5m  15m | read  writ| recv  send|run blk new| used  buff  cach
>  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq
> 0.69 0.18 0.06| 228  24.3 |   0 0 |0.0   0  24|17.8G 3204k  458M
>  108G|   0 0 |5257k  417k|  17k 3319 |  2   1  97   0   0   0
> 0.96 0.26 0.09| 591  27.9 | 522k  476k|4.1   0  69|18.3G 3204k  906M
>  107G|   0 0 |  45M  287k|  22k 6943 |  7   1  92   0   0   0
> 13.2 2.83 0.92|2187  28.7 |1311k  839k|5.3  90  18|18.9G 3204k 9008M
> 98.1G|   0 0 | 791M 8346k|  49k   25k| 17   1  36  46   0   0
> 30.6 6.91 2.27|2188  67.0 |4200k 3610k|8.8 106  27|19.5G 3204k 17.9G
> 88.4G|   0 0 | 927M 8396k| 116k  119k| 24   2  17  57   0   0
> 43.6 10.5 3.49|2136  24.3 |4371k 3708k|6.3 108 1.0|19.5G 3204k 26.7G
> 79.6G|   0 0 | 893M   13M| 117k  159k| 15   1  17  66   0   0
> 56.9 14.4 4.84|2152  32.5 |3937k 3767k| 11  83 5.0|19.5G 3204k 35.5G
> 70.7G|   0 0 | 894M   14M| 126k  160k| 16   1  16  65   0   0
> 63.2 17.1 5.83|2135  44.1 |4601k 4185k|6.9  99  35|19.6G 3204k 44.3G
> 61.9G|   0 0 | 879M   15M| 133k  168k| 19   2  19  60   0   0
> 64.6 18.9 6.54|2174  42.2 |4393k 3522k|8.4  93 2.2|20.0G 3204k 52.7G
> 53.0G|   0 0 | 897M   14M| 138k  160k| 14   2  15  69   0   0
>
> The IO shoots up (791M) as soon as CS has started up and accepts requests.
> I also diffed sysctl of the both machines. No significant differences.
> Only CPU-related, random values and some hashes differ.
>
> 2017-02-18 21:49 GMT+01:00 Benjamin Roth :
>
>> 256 tokens:
>>
>> root@cas9:/sys/block/dm-0# blockdev --report
>> RORA   SSZ   BSZ   StartSecSize   Device
>> rw   256   512  4096  067108864   /dev/ram0
>> rw   256   512  4096  067108864   /dev/ram1
>> rw   256   512  4096  067108864   /dev/ram2
>> rw   256   512  4096  067108864   /dev/ram3
>> rw   256   512  4096  067108864   /dev/ram4
>> rw   256   512  4096  067108864   /dev/ram5
>> rw   256   512  4096  067108864   /dev/ram6
>> rw   256   512  4096  067108864   /dev/ram7
>> rw   256   512  4096  067108864   /dev/ram8
>> rw   256   512  4096  067108864   /dev/ram9
>> rw   256   512  4096  067108864   /dev/ram10
>> rw   256   512  4096  067108864   /dev/ram11
>> rw   256   512  4096  067108864   /dev/ram12
>> rw   256   512  4096  067108864   /dev/ram13
>> rw   256   512  4096  067108864   /dev/ram14
>> rw   256   512  4096  067108864   /dev/ram15
>> rw16   512  4096  0800166076416 <0800%20166076416>
>> /dev/sda
>> rw16   512  4096   2048800164151296   /dev/sda1
>> rw16   512  4096  0644245094400 <06442%2045094400>
>> /dev/dm-0
>> rw16   512  4096  0  2046820352   /dev/dm-1
>> rw16   512  4096  0  1023410176   /dev/dm-2
>> rw16   512  4096  0800166076416 <0800%20166076416>
>> /dev/sdb
>>
>> 512 tokens:
>> root@cas10:/sys/block# blockdev --report
>> RORA   SSZ   BSZ   StartSecSize   Device
>> rw   256   512  4096  067108864   /dev/ram0
>> rw   256   512  4096  067108864   /dev/ram1
>> rw   256   512  4096  067108864   /dev/ram2
>> rw   256   512  4096  067108864   /dev/ram3
>> rw   256   512  4096  067108864   /dev/ram4
>> rw   256   512  4096  067108864   /dev/ram5
>> rw   256   512  4096  067108864   /dev/ram6
>> rw   256   512  4096  067108864   /dev/ram7
>> rw   256   512  4096  067108864   /dev/ram8
>> rw   256   512  4096  067108864   /dev/ram9
>> rw   256   512  4096  067108864   /dev/ram10
>> rw   256   512  4096  067108864   /dev/ram11
>> rw   256   512  4096  0

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
Just for the record, that's what dstat looks like while CS is starting:

root@cas10:~# dstat -lrnv 10
---load-avg--- --io/total- -net/total- ---procs--- --memory-usage-
---paging-- -dsk/total- ---system-- total-cpu-usage
 1m   5m  15m | read  writ| recv  send|run blk new| used  buff  cach  free|
 in   out | read  writ| int   csw |usr sys idl wai hiq siq
0.69 0.18 0.06| 228  24.3 |   0 0 |0.0   0  24|17.8G 3204k  458M  108G|
  0 0 |5257k  417k|  17k 3319 |  2   1  97   0   0   0
0.96 0.26 0.09| 591  27.9 | 522k  476k|4.1   0  69|18.3G 3204k  906M  107G|
  0 0 |  45M  287k|  22k 6943 |  7   1  92   0   0   0
13.2 2.83 0.92|2187  28.7 |1311k  839k|5.3  90  18|18.9G 3204k 9008M 98.1G|
  0 0 | 791M 8346k|  49k   25k| 17   1  36  46   0   0
30.6 6.91 2.27|2188  67.0 |4200k 3610k|8.8 106  27|19.5G 3204k 17.9G 88.4G|
  0 0 | 927M 8396k| 116k  119k| 24   2  17  57   0   0
43.6 10.5 3.49|2136  24.3 |4371k 3708k|6.3 108 1.0|19.5G 3204k 26.7G 79.6G|
  0 0 | 893M   13M| 117k  159k| 15   1  17  66   0   0
56.9 14.4 4.84|2152  32.5 |3937k 3767k| 11  83 5.0|19.5G 3204k 35.5G 70.7G|
  0 0 | 894M   14M| 126k  160k| 16   1  16  65   0   0
63.2 17.1 5.83|2135  44.1 |4601k 4185k|6.9  99  35|19.6G 3204k 44.3G 61.9G|
  0 0 | 879M   15M| 133k  168k| 19   2  19  60   0   0
64.6 18.9 6.54|2174  42.2 |4393k 3522k|8.4  93 2.2|20.0G 3204k 52.7G 53.0G|
  0 0 | 897M   14M| 138k  160k| 14   2  15  69   0   0

The IO shoots up (791M) as soon as CS has started up and accepts requests.
I also diffed sysctl of the both machines. No significant differences. Only
CPU-related, random values and some hashes differ.

2017-02-18 21:49 GMT+01:00 Benjamin Roth :

> 256 tokens:
>
> root@cas9:/sys/block/dm-0# blockdev --report
> RORA   SSZ   BSZ   StartSecSize   Device
> rw   256   512  4096  067108864   /dev/ram0
> rw   256   512  4096  067108864   /dev/ram1
> rw   256   512  4096  067108864   /dev/ram2
> rw   256   512  4096  067108864   /dev/ram3
> rw   256   512  4096  067108864   /dev/ram4
> rw   256   512  4096  067108864   /dev/ram5
> rw   256   512  4096  067108864   /dev/ram6
> rw   256   512  4096  067108864   /dev/ram7
> rw   256   512  4096  067108864   /dev/ram8
> rw   256   512  4096  067108864   /dev/ram9
> rw   256   512  4096  067108864   /dev/ram10
> rw   256   512  4096  067108864   /dev/ram11
> rw   256   512  4096  067108864   /dev/ram12
> rw   256   512  4096  067108864   /dev/ram13
> rw   256   512  4096  067108864   /dev/ram14
> rw   256   512  4096  067108864   /dev/ram15
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sda
> rw16   512  4096   2048800164151296   /dev/sda1
> rw16   512  4096  0644245094400 <06442%2045094400>
> /dev/dm-0
> rw16   512  4096  0  2046820352   /dev/dm-1
> rw16   512  4096  0  1023410176   /dev/dm-2
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sdb
>
> 512 tokens:
> root@cas10:/sys/block# blockdev --report
> RORA   SSZ   BSZ   StartSecSize   Device
> rw   256   512  4096  067108864   /dev/ram0
> rw   256   512  4096  067108864   /dev/ram1
> rw   256   512  4096  067108864   /dev/ram2
> rw   256   512  4096  067108864   /dev/ram3
> rw   256   512  4096  067108864   /dev/ram4
> rw   256   512  4096  067108864   /dev/ram5
> rw   256   512  4096  067108864   /dev/ram6
> rw   256   512  4096  067108864   /dev/ram7
> rw   256   512  4096  067108864   /dev/ram8
> rw   256   512  4096  067108864   /dev/ram9
> rw   256   512  4096  067108864   /dev/ram10
> rw   256   512  4096  067108864   /dev/ram11
> rw   256   512  4096  067108864   /dev/ram12
> rw   256   512  4096  067108864   /dev/ram13
> rw   256   512  4096  067108864   /dev/ram14
> rw   256   512  4096  067108864   /dev/ram15
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sda
> rw16   512  4096   2048800164151296   /dev/sda1
> rw16   512  4096  0800166076416 <0800%20166076416>
> /dev/sdb
> rw16   512  4096   2048800165027840   /dev/sdb1
> rw16   512  4096  0   1073741824000   /dev/dm-0
> rw16   512  4096  0  2046820352   /dev/dm-1
> rw16   512  4096  0  1023410176   /dev/dm-2
>
> 2017-02-18 21:41 GMT+01:00 Bhuvan Rawal :
>
>> Hi Ben,
>>
>> If its same on both machines then something else could 

Re: High disk io read load

2017-02-18 Thread Benjamin Roth
256 tokens:

root@cas9:/sys/block/dm-0# blockdev --report
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  067108864   /dev/ram0
rw   256   512  4096  067108864   /dev/ram1
rw   256   512  4096  067108864   /dev/ram2
rw   256   512  4096  067108864   /dev/ram3
rw   256   512  4096  067108864   /dev/ram4
rw   256   512  4096  067108864   /dev/ram5
rw   256   512  4096  067108864   /dev/ram6
rw   256   512  4096  067108864   /dev/ram7
rw   256   512  4096  067108864   /dev/ram8
rw   256   512  4096  067108864   /dev/ram9
rw   256   512  4096  067108864   /dev/ram10
rw   256   512  4096  067108864   /dev/ram11
rw   256   512  4096  067108864   /dev/ram12
rw   256   512  4096  067108864   /dev/ram13
rw   256   512  4096  067108864   /dev/ram14
rw   256   512  4096  067108864   /dev/ram15
rw16   512  4096  0800166076416   /dev/sda
rw16   512  4096   2048800164151296   /dev/sda1
rw16   512  4096  0644245094400   /dev/dm-0
rw16   512  4096  0  2046820352   /dev/dm-1
rw16   512  4096  0  1023410176   /dev/dm-2
rw16   512  4096  0800166076416   /dev/sdb

512 tokens:
root@cas10:/sys/block# blockdev --report
RORA   SSZ   BSZ   StartSecSize   Device
rw   256   512  4096  067108864   /dev/ram0
rw   256   512  4096  067108864   /dev/ram1
rw   256   512  4096  067108864   /dev/ram2
rw   256   512  4096  067108864   /dev/ram3
rw   256   512  4096  067108864   /dev/ram4
rw   256   512  4096  067108864   /dev/ram5
rw   256   512  4096  067108864   /dev/ram6
rw   256   512  4096  067108864   /dev/ram7
rw   256   512  4096  067108864   /dev/ram8
rw   256   512  4096  067108864   /dev/ram9
rw   256   512  4096  067108864   /dev/ram10
rw   256   512  4096  067108864   /dev/ram11
rw   256   512  4096  067108864   /dev/ram12
rw   256   512  4096  067108864   /dev/ram13
rw   256   512  4096  067108864   /dev/ram14
rw   256   512  4096  067108864   /dev/ram15
rw16   512  4096  0800166076416   /dev/sda
rw16   512  4096   2048800164151296   /dev/sda1
rw16   512  4096  0800166076416   /dev/sdb
rw16   512  4096   2048800165027840   /dev/sdb1
rw16   512  4096  0   1073741824000   /dev/dm-0
rw16   512  4096  0  2046820352   /dev/dm-1
rw16   512  4096  0  1023410176   /dev/dm-2

2017-02-18 21:41 GMT+01:00 Bhuvan Rawal :

> Hi Ben,
>
> If its same on both machines then something else could be the issue. We
> faced high disk io due to misconfigured read ahead which resulted in high
> amount of disk io for comparatively insignificant network transfer.
>
> Can you post output of blockdev --report for a normal node and 512 token
> node.
>
> Regards,
>
> On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth 
> wrote:
>
>> cat /sys/block/sda/queue/read_ahead_kb
>> => 8
>>
>> On all CS nodes. Is that what you mean?
>>
>> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal :
>>
>>> Hi Benjamin,
>>>
>>> What is the disk read ahead on both nodes?
>>>
>>> Regards,
>>> Bhuvan
>>>
>>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth 
>>> wrote:
>>>
 This is status of the largest KS of these both nodes:
 UN  10.23.71.10  437.91 GiB  512  49.1%
 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
 UN  10.23.71.9   246.99 GiB  256  28.3%
 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1

 So roughly as expected.

 2017-02-17 23:07 GMT+01:00 kurt greaves :

> what's the Owns % for the relevant keyspace from nodetool status?
>



 --
 Benjamin Roth
 Prokurist

 Jaumo GmbH · www.jaumo.com
 Wehrstraße 46 · 73035 Göppingen · Germany
 Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
 <07161%203048801>
 AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing 

Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Ben,

If its same on both machines then something else could be the issue. We
faced high disk io due to misconfigured read ahead which resulted in high
amount of disk io for comparatively insignificant network transfer.

Can you post output of blockdev --report for a normal node and 512 token
node.

Regards,

On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth 
wrote:

> cat /sys/block/sda/queue/read_ahead_kb
> => 8
>
> On all CS nodes. Is that what you mean?
>
> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal :
>
>> Hi Benjamin,
>>
>> What is the disk read ahead on both nodes?
>>
>> Regards,
>> Bhuvan
>>
>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth 
>> wrote:
>>
>>> This is status of the largest KS of these both nodes:
>>> UN  10.23.71.10  437.91 GiB  512  49.1%
>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>> UN  10.23.71.9   246.99 GiB  256  28.3%
>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>
>>> So roughly as expected.
>>>
>>> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>>>
 what's the Owns % for the relevant keyspace from nodetool status?

>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: High disk io read load

2017-02-18 Thread Benjamin Roth
cat /sys/block/sda/queue/read_ahead_kb
=> 8

On all CS nodes. Is that what you mean?

2017-02-18 21:32 GMT+01:00 Bhuvan Rawal :

> Hi Benjamin,
>
> What is the disk read ahead on both nodes?
>
> Regards,
> Bhuvan
>
> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth 
> wrote:
>
>> This is status of the largest KS of these both nodes:
>> UN  10.23.71.10  437.91 GiB  512  49.1%
>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>> UN  10.23.71.9   246.99 GiB  256  28.3%
>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>
>> So roughly as expected.
>>
>> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>>
>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-18 Thread Benjamin Roth
We are talking about a read IO increase of over 2000% with 512 tokens
compared to 256 tokens. 100% increase would be linear which would be
perfect. 200% would even okay, taking the RAM/Load ratio for caching into
account. But > 20x the read IO is really incredible.
The nodes are configured with puppet, they share the same roles and no
manual "optimizations" are applied. So I can't imagine, a different
configuration is responsible for it.

2017-02-18 21:28 GMT+01:00 Benjamin Roth :

> This is status of the largest KS of these both nodes:
> UN  10.23.71.10  437.91 GiB  512  49.1%
> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
> UN  10.23.71.9   246.99 GiB  256  28.3%
> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>
> So roughly as expected.
>
> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>
>> what's the Owns % for the relevant keyspace from nodetool status?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-18 Thread Bhuvan Rawal
Hi Benjamin,

What is the disk read ahead on both nodes?

Regards,
Bhuvan

On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth 
wrote:

> This is status of the largest KS of these both nodes:
> UN  10.23.71.10  437.91 GiB  512  49.1%
> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
> UN  10.23.71.9   246.99 GiB  256  28.3%
> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>
> So roughly as expected.
>
> 2017-02-17 23:07 GMT+01:00 kurt greaves :
>
>> what's the Owns % for the relevant keyspace from nodetool status?
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


Re: High disk io read load

2017-02-18 Thread Benjamin Roth
This is status of the largest KS of these both nodes:
UN  10.23.71.10  437.91 GiB  512  49.1%
2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
UN  10.23.71.9   246.99 GiB  256  28.3%
2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1

So roughly as expected.

2017-02-17 23:07 GMT+01:00 kurt greaves :

> what's the Owns % for the relevant keyspace from nodetool status?
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-17 Thread kurt greaves
what's the Owns % for the relevant keyspace from nodetool status?


Re: High disk io read load

2017-02-17 Thread Benjamin Roth
Hi Nate,

See here dstat results:
https://gist.github.com/brstgt/216c662b525a9c5b653bbcd8da5b3fcb
Network volume does not correspond to Disk IO, not even close.

@heterogenous vnode count:
I did this to test how load behaves on a new server class we ordered for
CS. The new nodes had much faster CPUs than our older nodes. If not
assigning more tokens to new nodes, what else would you recommend to give
more weight + load to newer and usually faster servers.

2017-02-16 23:21 GMT+01:00 Nate McCall :

>
> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>
>
> I very rarely see heterogeneous vnode counts in the same cluster. I would
> almost guarantee you are the only one doing this with MVs as well.
>
> That said, since you have different IO hardware, are you sure the system
> configurations (eg. block size, read ahead, etc) are the same on both
> machines? Is dstat showing a similar order of magnitude of network traffic
> in vs. IO for what you would expect?
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-16 Thread Nate McCall
> - Node A has 512 tokens and Node B 256. So it has double the load (data).
> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>

I very rarely see heterogeneous vnode counts in the same cluster. I would
almost guarantee you are the only one doing this with MVs as well.

That said, since you have different IO hardware, are you sure the system
configurations (eg. block size, read ahead, etc) are the same on both
machines? Is dstat showing a similar order of magnitude of network traffic
in vs. IO for what you would expect?


-- 
-
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: High disk io read load

2017-02-16 Thread Edward Capriolo
On Thu, Feb 16, 2017 at 12:38 AM, Benjamin Roth 
wrote:

> It doesn't really look like that:
> https://cl.ly/2c3Z1u2k0u2I
>
> Thats the ReadLatency.count metric aggregated by host which represents the
> actual read operations, correct?
>
> 2017-02-15 23:01 GMT+01:00 Edward Capriolo :
>
>> I think it has more than double the load. It is double the data. More
>> read repair chances. More load can swing it's way during node failures etc.
>>
>> On Wednesday, February 15, 2017, Benjamin Roth 
>> wrote:
>>
>>> Hi there,
>>>
>>> Following situation in cluster with 10 nodes:
>>> Node A's disk read IO is ~20 times higher than the read load of node B.
>>> The nodes are exactly the same except:
>>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>>
>>> Node A has roughly 460GB, Node B 260GB total disk usage.
>>> Both nodes have 128GB RAM and 40 cores.
>>>
>>> Of course I assumed that Node A does more reads because cache / load
>>> ratio is worse but a factor of 20 makes me very sceptic.
>>>
>>> Of course Node A has a much higher and less predictable latency due to
>>> the wait states.
>>>
>>> Has anybody experienced similar situations?
>>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>>> payload is not that few. I am pretty sure that not the whole dataset of
>>> 460GB is "hot".
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
> <+49%207161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

You could be correct. I also think a few things smooth out the curves.

- Intelligent clients
- Dynamic snitch

For example when testing out a an awesome JVM tune, you might see CPU usage
go down. From there you assume the tune worked, but what can happen is the
two dynamic mechanisms shift some small% of traffic away. Those affects
cascade as well. dynamic_snitch claims to shift load once performance is
$threshold worse.


Re: High disk io read load

2017-02-15 Thread Benjamin Roth
Erm sorry, forgot to mention. In this case "cas10" is Node A with 512
tokens and "cas9" Node B with 256 tokens.

2017-02-16 6:38 GMT+01:00 Benjamin Roth :

> It doesn't really look like that:
> https://cl.ly/2c3Z1u2k0u2I
>
> Thats the ReadLatency.count metric aggregated by host which represents the
> actual read operations, correct?
>
> 2017-02-15 23:01 GMT+01:00 Edward Capriolo :
>
>> I think it has more than double the load. It is double the data. More
>> read repair chances. More load can swing it's way during node failures etc.
>>
>> On Wednesday, February 15, 2017, Benjamin Roth 
>> wrote:
>>
>>> Hi there,
>>>
>>> Following situation in cluster with 10 nodes:
>>> Node A's disk read IO is ~20 times higher than the read load of node B.
>>> The nodes are exactly the same except:
>>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>>
>>> Node A has roughly 460GB, Node B 260GB total disk usage.
>>> Both nodes have 128GB RAM and 40 cores.
>>>
>>> Of course I assumed that Node A does more reads because cache / load
>>> ratio is worse but a factor of 20 makes me very sceptic.
>>>
>>> Of course Node A has a much higher and less predictable latency due to
>>> the wait states.
>>>
>>> Has anybody experienced similar situations?
>>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>>> payload is not that few. I am pretty sure that not the whole dataset of
>>> 460GB is "hot".
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>> <07161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
> <07161%203048801>
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-15 Thread Benjamin Roth
It doesn't really look like that:
https://cl.ly/2c3Z1u2k0u2I

Thats the ReadLatency.count metric aggregated by host which represents the
actual read operations, correct?

2017-02-15 23:01 GMT+01:00 Edward Capriolo :

> I think it has more than double the load. It is double the data. More read
> repair chances. More load can swing it's way during node failures etc.
>
> On Wednesday, February 15, 2017, Benjamin Roth 
> wrote:
>
>> Hi there,
>>
>> Following situation in cluster with 10 nodes:
>> Node A's disk read IO is ~20 times higher than the read load of node B.
>> The nodes are exactly the same except:
>> - Node A has 512 tokens and Node B 256. So it has double the load (data).
>> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>>
>> Node A has roughly 460GB, Node B 260GB total disk usage.
>> Both nodes have 128GB RAM and 40 cores.
>>
>> Of course I assumed that Node A does more reads because cache / load
>> ratio is worse but a factor of 20 makes me very sceptic.
>>
>> Of course Node A has a much higher and less predictable latency due to
>> the wait states.
>>
>> Has anybody experienced similar situations?
>> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
>> payload is not that few. I am pretty sure that not the whole dataset of
>> 460GB is "hot".
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer


Re: High disk io read load

2017-02-15 Thread Edward Capriolo
I think it has more than double the load. It is double the data. More read
repair chances. More load can swing it's way during node failures etc.

On Wednesday, February 15, 2017, Benjamin Roth 
wrote:

> Hi there,
>
> Following situation in cluster with 10 nodes:
> Node A's disk read IO is ~20 times higher than the read load of node B.
> The nodes are exactly the same except:
> - Node A has 512 tokens and Node B 256. So it has double the load (data).
> - Node A also has 2 SSDs, Node B only 1 SSD (according to load)
>
> Node A has roughly 460GB, Node B 260GB total disk usage.
> Both nodes have 128GB RAM and 40 cores.
>
> Of course I assumed that Node A does more reads because cache / load ratio
> is worse but a factor of 20 makes me very sceptic.
>
> Of course Node A has a much higher and less predictable latency due to the
> wait states.
>
> Has anybody experienced similar situations?
> Any hints how to analyze or optimize this - I mean 128GB cache for 460GB
> payload is not that few. I am pretty sure that not the whole dataset of
> 460GB is "hot".
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.