Hey all,
We’ve been running some benchmarks against Ceph which we deployed using the
Rook operator in Kubernetes. Everything seemed to scale linearly until a point
where I see a single OSD receiving much higher CPU load than the other OSDs
(nearly 100% saturation). After some investigation we noticed a ton of pubsub
traffic in the strace coming from the RGW pods like so:
[pid 22561] sendmsg(77, {msg_name(0)=NULL,
msg_iov(3)=[{"\21\2)\0\0\0\10\0:\1\0\0\10\0\0\0\0\0\10\0\0\0\0\0\0\20\0\0-\321\211K"...,
73}, {"\200\0\0\0pubsub.user.ceph-user-wwITOk"..., 314},
{"\0\303\34[\360\314\233\2138\377\377\377\377\377\377\377\377", 17}],
msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL|MSG_MORE <unfinished …>
I’ve checked other OSDs and only a single OSD receives these messages. I
suspect its creating a bottleneck. Does anyone have an idea on why these are
being generated or how to stop them? The pubsub sync module doesn’t appear to
be enabled, and our benchmark is doing simple gets/puts/deletes.
We’re running Ceph 14.2.5 nautilus
Thank you!
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]