Re: High disk io read load

Benjamin Roth Sun, 19 Feb 2017 10:42:57 -0800

This is the output of sar:
https://gist.github.com/anonymous/9545fb69fbb28a20dc99b2ea5e14f4cd
<https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fanonymous%2F9545fb69fbb28a20dc99b2ea5e14f4cd&sa=D&sntz=1&usg=AFQjCNH6r_GCSN0ZxmDx1f8xGRJPweV-EQ>


It seems to me that there es not enough page cache to handle all data in a
reasonable way.
As pointed out yesterday, the read rate with empty page cache is ~800MB/s.
Thats really (!!!) much for 4-5MB/s network output.

I stumbled across the compression chunk size, which I always left untouched
from the default of 64kb (https://cl.ly/2w0V3U1q1I1Y). I guess setting a
read ahead of 8kb is totally pointless if CS reads 64kb if it only has to
fetch a single row, right? Are there recommendations for that setting?

2017-02-19 19:15 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:

> Hi Edward,
>
> This could have been a valid case here but if hotspots indeed existed then
> along with really high disk io , the node should have been doing
> proportionate high network io as well. -  higher queries per second as well.
>
> But from the output shared by Benjamin that doesnt appear to be the case
> and things look balanced.
>
> Regards,
>
> On Sun, Feb 19, 2017 at 7:47 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
>>
>>
>> On Sat, Feb 18, 2017 at 3:35 PM, Benjamin Roth <benjamin.r...@jaumo.com>
>> wrote:
>>
>>> We are talking about a read IO increase of over 2000% with 512 tokens
>>> compared to 256 tokens. 100% increase would be linear which would be
>>> perfect. 200% would even okay, taking the RAM/Load ratio for caching into
>>> account. But > 20x the read IO is really incredible.
>>> The nodes are configured with puppet, they share the same roles and no
>>> manual "optimizations" are applied. So I can't imagine, a different
>>> configuration is responsible for it.
>>>
>>> 2017-02-18 21:28 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>>>
>>>> This is status of the largest KS of these both nodes:
>>>> UN  10.23.71.10  437.91 GiB  512          49.1%
>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>>> UN  10.23.71.9   246.99 GiB  256          28.3%
>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>>
>>>> So roughly as expected.
>>>>
>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>>
>>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>> <07161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 <+49%207161%203048806> · Fax +49 7161 304880-1
>>> <+49%207161%203048801>
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>> When I read articles like this:
>>
>> http://www.doanduyhai.com/blog/?p=1930
>>
>> And see the word hot-spot.
>>
>> "Another performance consideration worth mentioning is hot-spot. Similar
>> to manual denormalization, if your view partition key is chosen poorly,
>> you’ll end up with hot spots in your cluster. A simple example with our
>> *user* table is to create a materialized
>>
>> *view user_by_gender"It leads me to ask a question back: What can you say
>> about hotspots in your data? Even if your nodes had the identical number of
>> tokens this autho seems to suggesting that you still could have hotspots.
>> Maybe the issue is you have a hotspot 2x hotspots, or your application has
>> a hotspot that would be present even with perfect token balancing.*
>>
>>
>


-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: High disk io read load

Reply via email to