Hi everyone!

TL;DR;
Currently there's a degradation on the service for VMs and anything running on 
them (ex. toolforge, quarry, paws,
...), you might be able to use the services or they might become too slow, we 
are working on it and will update when
fixed.


Long story:

We were adding a new ceph node to the ceph cluster. This time the node was in a 
different subnet, but ceph is supposed
to be transparently able to work with many subnets. For some reason the new 
node was added to the cluster, but it's
missing to reply to any heartbeats sent from any other nodes in the cluster and 
that causes the cluster to keep
rebalancing data around, what creates a continuous IO slowness for any clients 
(like VMs).

We are trying to minimize the impact by limiting the amount of data that gets 
re-shuffled, that slows down the
intervention a bit, but should improve the client experience.

We are actively working on this, and will update with any changes.

Cheers!

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Cloud-announce mailing list -- [email protected]
List information: 
https://lists.wikimedia.org/postorius/lists/cloud-announce.lists.wikimedia.org/
_______________________________________________
Cloud mailing list -- [email protected]
List information: 
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/

Reply via email to