Re: high latency on one node after replacement

2018-03-27 Thread Mike Torra
thanks for pointing that out, i just found it too :) i overlooked this

On Tue, Mar 27, 2018 at 3:44 PM, Voytek Jarnot 
wrote:

> Have you ruled out EBS snapshot initialization issues (
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html)?
>
> On Tue, Mar 27, 2018 at 2:24 PM, Mike Torra  wrote:
>
>> Hi There -
>>
>> I have noticed an issue where I consistently see high p999 read latency
>> on a node for a few hours after replacing the node. Before replacing the
>> node, the p999 read latency is ~30ms, but after it increases to 1-5s. I am
>> running C* 3.11.2 in EC2.
>>
>> I am testing out using EBS snapshots of the /data disk as a backup, so
>> that I can replace nodes without having to fully bootstrap the replacement.
>> This seems to work ok, except for the latency issue. Some things I have
>> noticed:
>>
>> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
>> 'Dropped', while this is going on. There are only a few of these.
>> - the logs show warnings like this:
>>
>> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
>> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
>> with average duration of 235.88ms, 86 have exceeded the configured commit
>> interval by an average of 113.66ms
>>   and I can see some slow queries in debug.log, but I can't figure out
>> what is causing it
>> - gc seems normal
>>
>> Could this have something to do with starting the node with the EBS
>> snapshot of the /data directory? My first thought was that this is related
>> to the EBS volumes, but it seems too consistent to be actually caused by
>> that. The problem is consistent across multiple replacements, and multiple
>> EC2 regions.
>>
>> I appreciate any suggestions!
>>
>> - Mike
>>
>
>


Re: high latency on one node after replacement

2018-03-27 Thread Voytek Jarnot
Have you ruled out EBS snapshot initialization issues (
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html)?

On Tue, Mar 27, 2018 at 2:24 PM, Mike Torra  wrote:

> Hi There -
>
> I have noticed an issue where I consistently see high p999 read latency on
> a node for a few hours after replacing the node. Before replacing the node,
> the p999 read latency is ~30ms, but after it increases to 1-5s. I am
> running C* 3.11.2 in EC2.
>
> I am testing out using EBS snapshots of the /data disk as a backup, so
> that I can replace nodes without having to fully bootstrap the replacement.
> This seems to work ok, except for the latency issue. Some things I have
> noticed:
>
> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
> 'Dropped', while this is going on. There are only a few of these.
> - the logs show warnings like this:
>
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
> with average duration of 235.88ms, 86 have exceeded the configured commit
> interval by an average of 113.66ms
>   and I can see some slow queries in debug.log, but I can't figure out
> what is causing it
> - gc seems normal
>
> Could this have something to do with starting the node with the EBS
> snapshot of the /data directory? My first thought was that this is related
> to the EBS volumes, but it seems too consistent to be actually caused by
> that. The problem is consistent across multiple replacements, and multiple
> EC2 regions.
>
> I appreciate any suggestions!
>
> - Mike
>


Re: high latency on one node after replacement

2018-03-27 Thread Christophe Schmitz
Hi Mike,

Unlike normal EBS volumes for which you don't need to pre-warm, I think
you  need to pre-Warm your EBS volume restored from a snapshot
Have a look at this AWS doc
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html
It says that:
However, storage blocks on volumes that were restored from snapshots must
be initialized (pulled down from Amazon S3 and written to the volume)
before you can access the block. This preliminary action takes time and can
cause a significant increase in the latency of an I/O operation the first
time each block is accessed. For most applications, amortizing this cost
over the lifetime of the volume is acceptable. Performance is restored
after the data is accessed once.

I hope it helps :)

Cheers,

Christophe

On 28 March 2018 at 06:24, Mike Torra  wrote:

> Hi There -
>
> I have noticed an issue where I consistently see high p999 read latency on
> a node for a few hours after replacing the node. Before replacing the node,
> the p999 read latency is ~30ms, but after it increases to 1-5s. I am
> running C* 3.11.2 in EC2.
>
> I am testing out using EBS snapshots of the /data disk as a backup, so
> that I can replace nodes without having to fully bootstrap the replacement.
> This seems to work ok, except for the latency issue. Some things I have
> noticed:
>
> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
> 'Dropped', while this is going on. There are only a few of these.
> - the logs show warnings like this:
>
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
> with average duration of 235.88ms, 86 have exceeded the configured commit
> interval by an average of 113.66ms
>   and I can see some slow queries in debug.log, but I can't figure out
> what is causing it
> - gc seems normal
>
> Could this have something to do with starting the node with the EBS
> snapshot of the /data directory? My first thought was that this is related
> to the EBS volumes, but it seems too consistent to be actually caused by
> that. The problem is consistent across multiple replacements, and multiple
> EC2 regions.
>
> I appreciate any suggestions!
>
> - Mike
>



-- 

*Christophe Schmitz - **VP Consulting*

AU: +61 4 03751980 / FR: +33 7 82022899

   


Read our latest technical blog posts here
. This email has been sent on behalf
of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This
email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.