What is your replication factor?
Single datacenter, three availability zones, is that right?
You removed one node at a time or three at once?

On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash <fmhab...@gmail.com> wrote:

> We have had a 15 node cluster across three zones and cluster repairs using
> ‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the
> cluster to 12. Since then, same repair job has taken up to 12 hours to
> finish and most times, it never does.
>
>
>
> More importantly, at some point during the repair cycle, we see read
> latencies jumping to 1-2 seconds and applications immediately notice the
> impact.
>
>
>
> stream_throughput_outbound_megabits_per_sec is set at 200 and
> compaction_throughput_mb_per_sec at 64. The /data dir on the nodes is
> around ~500GB at 44% usage.
>
>
>
> When shrinking the cluster, the ‘nodetool decommision’ was eventless. It
> completed successfully with no issues.
>
>
>
> What could possibly cause repairs to cause this impact following cluster
> downsizing? Taking three nodes out does not seem compatible with such a
> drastic effect on repair and read latency.
>
>
>
> Any expert insights will be appreciated.
>
> ----------------
> Thank you
>
>
>

Reply via email to