Hi Everyone,

Which is the best way to replace a failing (SMART Health Status: HARDWARE IMPENDING FAILURE) OSD hard disk?

Normally I will:

1. set the OSD as out
2. wait for rebalancing
3. stop the OSD on the osd-server (unmount if needed)
4. purge the OSD from CEPH
5. physically replace the disk with the new one
6. with ceph-deploy:
6a   zap the new disk (just in case)
6b   create the new OSD
7. add the new osd to the crush map.
8. wait for rebalancing.

My questions are:

- Is my procedure reasonable?
- What if I skip the #2 and instead to wait for rebalancing I directly purge the OSD?
- Is better to reweight the OSD before take it out?

I'm running a Luminous (12.2.2) cluster with 332 OSDs, failure domain is host.

Thanks,
Iztok

--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
http://www.elettra.eu


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to