Hello,
for about a year and a half I have been supporting a cluster of Ceph for my
company (v.15.2.3 on centos 8 which is out of support already) that is used
only for S3 and until recently there were no serious problems that I could
not deal with of a different nature,
but the last problem that appeared about 2 months ago I can not find a
solution alone.
After adding a firewall for a short time (about 15-20 minutes), each of the
hosts was isolated from the monitoring servers, which led to the following
error message:
ceph> health detail
HEALTH_ERR 8 hosts fail cephadm check; failed to probe daemons or devices;
Module 'cephadm' has failed: cannot send (already closed?)
[WRN] CEPHADM_HOST_CHECK_FAILED: 8 hosts fail cephadm check
host mon4 failed check: cannot send (already closed?)
host mon5 failed check: cannot send (already closed?)
host rgw1 failed check: cannot send (already closed?)
host srv1 failed check: cannot send (already closed?)
host srv2 failed check: cannot send (already closed?)
host srv3 failed check: cannot send (already closed?)
host srv4 failed check: cannot send (already closed?)
host srv5 failed check: cannot send (already closed?)
[WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or devices
host mon4 scrape failed: cannot send (already closed?)
host mon4 ceph-volume inventory failed: cannot send (already closed?)
host mon5 scrape failed: cannot send (already closed?)
host mon5 ceph-volume inventory failed: cannot send (already closed?)
host rgw1 scrape failed: cannot send (already closed?)
host rgw1 ceph-volume inventory failed: cannot send (already closed?)
host srv1 scrape failed: cannot send (already closed?)
host srv1 ceph-volume inventory failed: cannot send (already closed?)
host srv2 scrape failed: cannot send (already closed?)
host srv2 ceph-volume inventory failed: cannot send (already closed?)
host srv3 scrape failed: cannot send (already closed?)
host srv3 ceph-volume inventory failed: cannot send (already closed?)
host srv4 scrape failed: cannot send (already closed?)
host srv4 ceph-volume inventory failed: cannot send (already closed?)
host srv5 scrape failed: cannot send (already closed?)
host srv5 ceph-volume inventory failed: cannot send (already closed?)
Despite these errors, the cluster is working and the data is currently being
accessed normally.
I have not noticed any of the services dropped. Despite the errors, it was
necessary to add a new srv6 server,
which was normally added to the cluster and worked as expected, but
immediately after that another error occurred:
[ERR] MGR_MODULE_ERROR: Module 'cephadm' has failed: cannot send (already
closed?)
Module 'cephadm' has failed: cannot send (already closed?)
Which put the cluster in ERROR state. The hosts are alive and connected.
#ceph orch host ls
HOST ADDR LABELS STATUS
adm adm mgr
mon1 mon1 mgr
mon2 mon2
mon3 mon3 mgr
mon4 mon4
mon5 mon5
rgw1 rgw1
rgw2-real rgw2-real
srv1 srv1
srv2 srv2
srv3 srv3
srv4 srv4
srv5 192.168.236.215
srv6 192.168.236.216
Any advice is welcome. I read everything that is related to the errors in
question and that I was able to find in the different groups, but none of
the proposed solutions led to a positive result.
Regards,
Kalin
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]