Have you ruled out EBS snapshot initialization issues (
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html)?

On Tue, Mar 27, 2018 at 2:24 PM, Mike Torra <mto...@salesforce.com> wrote:

> Hi There -
>
> I have noticed an issue where I consistently see high p999 read latency on
> a node for a few hours after replacing the node. Before replacing the node,
> the p999 read latency is ~30ms, but after it increases to 1-5s. I am
> running C* 3.11.2 in EC2.
>
> I am testing out using EBS snapshots of the /data disk as a backup, so
> that I can replace nodes without having to fully bootstrap the replacement.
> This seems to work ok, except for the latency issue. Some things I have
> noticed:
>
> - `nodetool netstats` doesn't show any 'Completed' Large Messages, only
> 'Dropped', while this is going on. There are only a few of these.
> - the logs show warnings like this:
>
> WARN  [PERIODIC-COMMIT-LOG-SYNCER] 2018-03-27 18:57:15,655
> NoSpamLogger.java:94 - Out of 84 commit log syncs over the past 297.28s
> with average duration of 235.88ms, 86 have exceeded the configured commit
> interval by an average of 113.66ms
>   and I can see some slow queries in debug.log, but I can't figure out
> what is causing it
> - gc seems normal
>
> Could this have something to do with starting the node with the EBS
> snapshot of the /data directory? My first thought was that this is related
> to the EBS volumes, but it seems too consistent to be actually caused by
> that. The problem is consistent across multiple replacements, and multiple
> EC2 regions.
>
> I appreciate any suggestions!
>
> - Mike
>

Reply via email to