After doing some testing, I'm a bit confused even more.
What I'm trying to achieve is minimal data movement when I have to service
a node to replace a failed drive. Since these nodes don't have hot-swap
bays, I'll need to power down the box to replace the failed drive. I don't
want Ceph to shuffle data until the new drive comes up and is ready.
My thought was to set norecover nobackfill, take down the host, replace the
drive, start the host, remove the old OSD from the cluster, ceph-disk
prepare the new disk then unset norecover nobackfill.
However in my testing with a 4 node cluster ( v.94.0 10 OSDs each,
replication 3, min_size 2, chooselead_fristn host), if I take down a host
I/O becomes blocked even though only one copy should be taken down and
still satisfies min_size. When I unset norecover, then I/O proceeds and
some backfill activity happens. At some point the backfill stops and
everything seems to be "happy" in the degraded state.
I'm really interested to know what is going on with "norecover" as the
cluster seems to break if it is set. Unsetting the "norecover" flag causes
some degraded objects to recover, but not all. Writing to new blocks in an
RBD causes the number of degraded objects to increase, but works just fine
otherwise. Here is an example after taking down one host and removing the
OSDs from the CRUSH map (I'm reformatting all the drives in the host
currently).
# ceph status
cluster 146c4fe8-7c85-46dc-b8b3-69072d658287
health HEALTH_WARN
1345 pgs backfill
10 pgs backfilling
2016 pgs degraded
661 pgs recovery_wait
2016 pgs stuck degraded
2016 pgs stuck unclean
1356 pgs stuck undersized
1356 pgs undersized
recovery 40642/167785 objects degraded (24.223%)
recovery 31481/167785 objects misplaced (18.763%)
too many PGs per OSD (665 > max 300)
nobackfill flag(s) set
monmap e5: 3 mons at {nodea=
10.8.6.227:6789/0,nodeb=10.8.6.228:6789/0,nodec=10.8.6.229:6789/0}
election epoch 2576, quorum 0,1,2 nodea,nodeb,nodec
osdmap e59031: 30 osds: 30 up, 30 in; 1356 remapped pgs
flags nobackfill
pgmap v4723208: 6656 pgs, 4 pools, 330 GB data, 53235 objects
863 GB used, 55000 GB / 55863 GB avail
40642/167785 objects degraded (24.223%)
31481/167785 objects misplaced (18.763%)
4640 active+clean
1345 active+undersized+degraded+remapped+wait_backfill
660 active+recovery_wait+degraded
10 active+undersized+degraded+remapped+backfilling
1 active+recovery_wait+undersized+degraded+remapped
client io 1864 kB/s rd, 8853 kB/s wr, 65 op/s
Any help understanding these flags would be very helpful.
Thanks,
Robert
On Mon, Apr 13, 2015 at 1:40 PM, Robert LeBlanc <[email protected]>
wrote:
> I'm looking for documentation about what exactly each of these do and
> I can't find it. Can someone point me in the right direction?
>
> The names seem too ambiguous to come to any conclusion about what
> exactly they do.
>
> Thanks,
> Robert
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com