Den tors 9 maj 2019 kl 17:46 skrev Feng Zhang <[email protected]>: > Thanks, guys. > > I forgot the IOPS. So since I have 100disks, the total > IOPS=100X100=10K. For the 4+2 erasure, one disk fail, then it needs to > read 5 and write 1 objects.Then the whole 100 disks can do 10K/6 ~ 2K > rebuilding actions per seconds. > > While for the 100X6TB disks, suppose the object size is set to 4MB, > then 6TB/4MB=1.25 million objects. Not considering the disk throughput > IO or CPUs, fully rebuilding takes: > > 1.25M/2K=600 seconds? >
I think you will _never_ see a full cluster all helping out at 100% to fix such an issue, so while your math is probably correctly describing the absolute best-case, reality will be somewhere below that. Still, it will be quite possible to cause this situation and make a measurement of your own with exactly your own circumstances, since everyones setup is slightly different. Replacing broken drives is normal for any large storage system, and ceph will prioritze client traffic most of the time over normal repairs, so that will add to the total calendar time it takes for recovery, but keep your users happy while doing it. -- May the most significant bit of your life be positive.
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
