Re: [ceph-users] Power outages!!! help!

Ronny Aasen Mon, 28 Aug 2017 14:36:43 -0700

> [SNIP - bad drives]

Generally when a disk is displaying bad blocks to the OS, the drive havebeen remapping blocks for ages in the background. and the disk is reallyon it's last legs. a bit unlikely that you get so many disks dying atthe same time tho. but the problem can have been silently worsening andwas not realy noticed until the osd had to restart due to the powerloss.

if this is _very_ important data i would recomend you start by takingthe bad drives out of operation, and cloning the bad drive block byblock onto a good one. by using dd_rescue. also a good idea to store aimage of the disk so you can try the different rescue methods severaltimes. in the very worst case send the disk to a professional datarecovery company.


once that is done, you have 2 options:

try to make the osd run again, by. xfs_fsck, + manually finding corruptobjects. (find + md5sum (look for read errors)) and deleting them havehelped me in the past. if you manage to get the osd to run, drain it, bysetting crush weight to 0. and eventualy remove the disk from the cluster.

alternativly if you can not get the osd running again:

use ceph objectstoretool to extract objects and inject them using aclean node and osd like described inhttp://ceph.com/geen-categorie/incomplete-pgs-oh-my/ read the man pageand help for the tool i think the arguments have changed slightly sincethat blogpost.

you may also run into read errors on corrupt objects, stopping yourexport. in that case rm the offending object and rerun the export.

repeat for all bad drives.

when doing the inject it is important that your cluster is operationaland able to accept objects from the draining drive, so either setminimal replication type to OSD, or even better. add more osd nodes tomake a operational cluster (with missing objects)

also i see in your log you have os-prober testing all partitions. i tendto remove os-prober on machines that does not dualboot with another os.


rules of thumb for future ceph clusters:
min_size =2 for a reason it should never be 1 unless dataloss is wanted.

size=3 f you need the cluster to be operating with a drive or node in aerror state. size=2 gives you more space but the cluster will block onerrors until the recovery is done. better to be blocking then loosing data.if you have size=3 and 3 nodes and you loose a node, then your clustercan not self heal. you should have more nodes then you have set size to.have free space on drives, this is where data is replicated to in caseof a down node. if you have 4 nodes and you want to be able to looseone, and still operate. you need leftover room on your 3 remaining nodesto cover for the lost one. the more nodes you have the less the impactof a node failure is. and the less spare room is needed for a 4 nodecluster you should not fill more then 66% if you want to be able toself-heal + operate.




good luck
Ronny Aasen


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Power outages!!! help!

Reply via email to