Re: [ceph-users] Cephfs unaccessible

Marco Aroldi Tue, 23 Apr 2013 00:55:49 -0700

Hi,
this morning I have this situation:
   health HEALTH_WARN 1540 pgs backfill; 30 pgs backfill_toofull; 113
pgs backfilling; 43 pgs degraded; 38 pgs peering; 5 pgs recovering;
484 pgs recovery_wait; 38 pgs stuck inactive; 2180 pgs stuck unclean;
recovery 2153828/21551430 degraded (9.994%); noup,nodown flag(s) set
   monmap e1: 3 mons at
{m1=192.168.21.11:6789/0,m2=192.168.21.12:6789/0,m3=192.168.21.13:6789/0},
election epoch 50, quorum 0,1,2 m1,m2,m3
   osdmap e34624: 62 osds: 62 up, 62 in
   pgmap v1496556: 17280 pgs: 15098 active+clean, 1471
active+remapped+wait_backfill, 9 active+degraded+wait_backfill, 30
active+remapped+wait_backfill+
backfill_toofull, 462
active+recovery_wait, 18 peering, 109 active+remapped+backfilling, 1
active+clean+scrubbing, 30 active+degraded+remapped+wait_backfill, 22
active+recovery_wait+remapped, 20 remapped+peering, 4
active+degraded+remapped+backfilling, 1 active+clean+scrubbing+deep, 5
active+recovering; 50432 GB data, 76489 GB used, 36942 GB / 110 TB
avail; 2153828/21551430 degraded (9.994%)
   mdsmap e52: 1/1/1 up {0=m1=up:active}, 2 up:standby


No data movement
The cephfs mounts works but many many directories are inaccessible:
the clients hangs with just a simple "ls"

ceph -w repeat to log these lines: http://pastebin.com/AN01wgfV

What can I do to get better?
Thanks for your help

--
Marco Aroldi

2013/4/22 Marco Aroldi <[email protected]>:
> Hey,
> Cephfs has became available!
> I didn't change the rules
>
> Do you guys see something "lost" or "abolutely screwed" from these messages?
> Do I have only to wait?
> I see backfill_tooful: it sound strange because I have set the option
> "osd  backfill tooful ratio = 0.91" in conf and not one of my osd now
> is over that percentage
>
> Thanks
>
> The health now is
> HEALTH_WARN 2038 pgs backfill; 43 pgs backfill_toofull; 134 pgs
> backfilling; 62 pgs degraded; 590 pgs recovery_wait; 2765 pgs stuck
> unclean; recovery 2780119/22308143 degraded (12.462%);  recovering 42
> o/s, 197MB/s; 5 near full osd(s); noup,nodown flag(s) set
>
> 2013-04-22 11:33:31.288690 mon.0 [INF] pgmap v1459630: 17280 pgs:
> 14512 active+clean, 1945 active+remapped+wait_backfill, 14
> active+degraded+wait_backfill, 36
> active+remapped+wait_backfill+backfill_toofull, 565
> active+recovery_wait, 128 active+remapped+backfilling, 4
> active+remapped+backfill_toofull, 4 active+degraded+backfilling, 3
> active+clean+scrubbing, 37 active+degraded+remapped+wait_backfill, 25
> active+recovery_wait+remapped, 3
> active+degraded+remapped+wait_backfill+backfill_toofull, 4
> active+degraded+remapped+backfilling; 50432 GB data, 76416 GB used,
> 37015 GB / 110 TB avail; 2777977/22308143 degraded (12.453%);
> recovering 15 o/s, 68099KB/s
>
> 2013/4/22 Marco Aroldi <[email protected]>:
>> In the original design,
>> I've change the rules since I would data placed with replica 2 in 2
>> identical room (named p1 and p2)
>> Now that 1 room has 4 osd out of cluster, do I have to change the
>> rules and use an "type host" rule instead "type room"?
>> Could this help?
>>
>> root default {
>>         id -1           # do not change unnecessarily
>>         # weight 122.500
>>         alg straw
>>         hash 0  # rjenkins1
>>         item p1 weight 57.500
>>         item p2 weight 65.000
>> }
>>
>> # rules
>> rule data {
>>         ruleset 0
>>         type replicated
>>         min_size 1
>>         max_size 10
>>         step take default
>>         step chooseleaf firstn 0 type room
>>         step emit
>> }
>> rule metadata {
>>         ruleset 1
>>         type replicated
>>         min_size 1
>>         max_size 10
>>         step take default
>>         step chooseleaf firstn 0 type room
>>         step emit
>> }
>> rule rbd {
>>         ruleset 2
>>         type replicated
>>         min_size 1
>>         max_size 10
>>         step take default
>>         step chooseleaf firstn 0 type room
>>         step emit
>> }
>>
>> # end crush map
>>
>>
>> ceph health:
>>
>> HEALTH_WARN 2072 pgs backfill; 43 pgs backfill_toofull; 131 pgs
>> backfilling; 68 pgs degraded; 594 pgs recovery_wait; 2802 pgs stuck
>> unclean; recovery 2811952/22351845 degraded (12.580%);  recovering 35
>> o/s, 197MB/s; 4 near full osd(s); noup,nodown flag(s) set
>>
>>
>> 2013-04-22 10:53:26.800014 mon.0 [INF] pgmap v1457213: 17280 pgs:
>> 14474 active+clean, 1975 active+remapped+wait_backfill, 18
>> active+degraded+wait_backfill, 37
>> active+remapped+wait_backfill+backfill_toofull, 569
>> active+recovery_wait, 123 active+remapped+backfilling, 3
>> active+remapped+backfill_toofull, 3 active+degraded+backfilling, 6
>> active+clean+scrubbing, 39 active+degraded+remapped+wait_backfill, 25
>> active+recovery_wait+remapped, 3
>> active+degraded+remapped+wait_backfill+backfill_toofull, 5
>> active+degraded+remapped+backfilling; 50432 GB data, 76277 GB used,
>> 37154 GB / 110 TB avail; 2811241/22350671 degraded (12.578%);
>> recovering 29 o/s, 119MB/s
>>
>> 2013/4/22 Marco Aroldi <[email protected]>:
>>> The rebalance is still going
>>> and the mounts are still refused
>>>
>>> I've re-set the nodown noup flags because the osd are flapping continuously
>>> and added in ceph.conf "osd backfill tooful ratio = 0.91", tryin to
>>> get rid of all that "backfill_tooful"
>>>
>>> What I have to to now to regain access?
>>>
>>> I can provide you any logs or whatever you need
>>> Thanks for support
>>>
>>> in ceph -w I see this:
>>> 2013-04-22 09:25:46.601721 osd.8 [WRN] 1 slow requests, 1 included
>>> below; oldest blocked for > 5404.500806 secs
>>> 2013-04-22 09:25:46.601727 osd.8 [WRN] slow request 5404.500806
>>> seconds old, received at 2013-04-22 07:55:42.100886:
>>> osd_op(mds.0.9:177037 10000025d80.000017b3 [stat] 0.300279a9 RETRY
>>> rwordered) v4 currently reached pgosd
>>>
>>> this is the ceph mds dump:
>>>
>>> dumped mdsmap epoch 52
>>> epoch    52
>>> flags    0
>>> created    2013-03-18 14:42:29.330548
>>> modified    2013-04-22 09:08:45.599613
>>> tableserver    0
>>> root    0
>>> session_timeout    60
>>> session_autoclose    300
>>> last_failure    49
>>> last_failure_osd_epoch    33152
>>> compat    compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in
>>> separate object}
>>> max_mds    1
>>> in    0
>>> up    {0=6957}
>>> failed
>>> stopped
>>> data_pools    [0]
>>> metadata_pool    1
>>> 6957:    192.168.21.11:6800/5844 'm1' mds.0.10 up:active seq 23
>>> 5945:    192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1
>>> 5963:    192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1
>>>
>>> ceph health:
>>>
>>> HEALTH_WARN 2133 pgs backfill; 47 pgs backfill_toofull; 136 pgs
>>> backfilling; 74 pgs degraded; 1 pgs recovering; 599 pgs recovery_wait;
>>> 2877 pgs stuck unclean; recovery 2910416/22449672 degraded (12.964%);
>>> recovering 10 o/s, 48850KB/s; 7 near full osd(s); noup,nodown flag(s)
>>> set
>>>
>>> 2013-04-22 09:34:11.436514 mon.0 [INF] pgmap v1452450: 17280 pgs:
>>> 14403 active+clean, 2032 active+remapped+wait_backfill, 19
>>> active+degraded+wait_backfill, 35
>>> active+remapped+wait_backfill+backfill_toofull, 574
>>> active+recovery_wait, 126 active+remapped+backfilling, 9
>>> active+remapped+backfill_toofull, 3 active+degraded+backfilling, 2
>>> active+clean+scrubbing, 41 active+degraded+remapped+wait_backfill, 25
>>> active+recovery_wait+remapped, 3
>>> active+degraded+remapped+wait_backfill+backfill_toofull, 8
>>> active+degraded+remapped+backfilling; 50432 GB data, 76229 GB used,
>>> 37202 GB / 110 TB avail; 2908837/22447349 degraded (12.958%);
>>> recovering 6 o/s, 20408KB/s
>>>
>>> 2013/4/21 Marco Aroldi <[email protected]>:
>>>> Greg, your supposition about the small amount data to be written is
>>>> right but the rebalance is writing an insane amount of data to the new
>>>> nodes and the mount is not working again
>>>>
>>>> this is the node S203 (the os is on /dev/sdl, not listed)
>>>>
>>>> /dev/sda1       1.9T  467G  1.4T  26% /var/lib/ceph/osd/ceph-44
>>>> /dev/sdb1       1.9T  595G  1.3T  33% /var/lib/ceph/osd/ceph-45
>>>> /dev/sdc1       1.9T  396G  1.5T  22% /var/lib/ceph/osd/ceph-46
>>>> /dev/sdd1       1.9T  401G  1.5T  22% /var/lib/ceph/osd/ceph-47
>>>> /dev/sde1       1.9T  337G  1.5T  19% /var/lib/ceph/osd/ceph-48
>>>> /dev/sdf1       1.9T  441G  1.4T  24% /var/lib/ceph/osd/ceph-49
>>>> /dev/sdg1       1.9T  338G  1.5T  19% /var/lib/ceph/osd/ceph-50
>>>> /dev/sdh1       1.9T  359G  1.5T  20% /var/lib/ceph/osd/ceph-51
>>>> /dev/sdi1       1.4T  281G  1.1T  21% /var/lib/ceph/osd/ceph-52
>>>> /dev/sdj1       1.4T  423G  964G  31% /var/lib/ceph/osd/ceph-53
>>>> /dev/sdk1       1.9T  421G  1.4T  23% /var/lib/ceph/osd/ceph-54
>>>>
>>>> 2013/4/21 Marco Aroldi <[email protected]>:
>>>>> What I can try to do/delete to regain access?
>>>>> Those osd are crazy, flapping up and down. I think that the situation
>>>>> is without control
>>>>>
>>>>>
>>>>> HEALTH_WARN 2735 pgs backfill; 13 pgs backfill_toofull; 157 pgs
>>>>> backfilling; 188 pgs degraded; 251 pgs peering; 13 pgs recovering;
>>>>> 1159 pgs recovery_wait; 159 pgs stuck inactive; 4641 pgs stuck
>>>>> unclean; recovery 4007916/23007073 degraded (17.420%);  recovering 4
>>>>> o/s, 31927KB/s; 19 near full osd(s)
>>>>>
>>>>> 2013-04-21 18:56:46.839851 mon.0 [INF] pgmap v1399007: 17280 pgs: 276
>>>>> active, 12791 active+clean, 2575 active+remapped+wait_backfill, 71
>>>>> active+degraded+wait_backfill, 6
>>>>> active+remapped+wait_backfill+backfill_toofull, 1121
>>>>> active+recovery_wait, 90 peering, 3 remapped, 1 active+remapped, 127
>>>>> active+remapped+backfilling, 1 active+degraded, 5
>>>>> active+remapped+backfill_toofull, 19 active+degraded+backfilling, 1
>>>>> active+clean+scrubbing, 79 active+degraded+remapped+wait_backfill, 36
>>>>> active+recovery_wait+remapped, 1
>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 46
>>>>> remapped+peering, 16 active+degraded+remapped+backfilling, 1
>>>>> active+recovery_wait+degraded+remapped, 14 active+recovering; 50435 GB
>>>>> data, 74790 GB used, 38642 GB / 110 TB avail; 4018849/23025448
>>>>> degraded (17.454%);  recovering 14 o/s, 54732KB/s
>>>>>
>>>>> # id    weight    type name    up/down    reweight
>>>>> -1    130    root default
>>>>> -9    65        room p1
>>>>> -3    44            rack r14
>>>>> -4    22                host s101
>>>>> 11    2                    osd.11    up    1
>>>>> 12    2                    osd.12    up    1
>>>>> 13    2                    osd.13    up    1
>>>>> 14    2                    osd.14    up    1
>>>>> 15    2                    osd.15    up    1
>>>>> 16    2                    osd.16    up    1
>>>>> 17    2                    osd.17    up    1
>>>>> 18    2                    osd.18    up    1
>>>>> 19    2                    osd.19    up    1
>>>>> 20    2                    osd.20    up    1
>>>>> 21    2                    osd.21    up    1
>>>>> -6    22                host s102
>>>>> 33    2                    osd.33    up    1
>>>>> 34    2                    osd.34    up    1
>>>>> 35    2                    osd.35    up    1
>>>>> 36    2                    osd.36    up    1
>>>>> 37    2                    osd.37    up    1
>>>>> 38    2                    osd.38    up    1
>>>>> 39    2                    osd.39    up    1
>>>>> 40    2                    osd.40    up    1
>>>>> 41    2                    osd.41    up    1
>>>>> 42    2                    osd.42    up    1
>>>>> 43    2                    osd.43    up    1
>>>>> -13    21            rack r10
>>>>> -12    21                host s103
>>>>> 55    2                    osd.55    up    1
>>>>> 56    2                    osd.56    up    1
>>>>> 57    2                    osd.57    up    1
>>>>> 58    2                    osd.58    up    1
>>>>> 59    2                    osd.59    down    0
>>>>> 60    2                    osd.60    down    0
>>>>> 61    2                    osd.61    down    0
>>>>> 62    2                    osd.62    up    1
>>>>> 63    2                    osd.63    up    1
>>>>> 64    1.5                    osd.64    up    1
>>>>> 65    1.5                    osd.65    down    0
>>>>> -10    65        room p2
>>>>> -7    22            rack r20
>>>>> -5    22                host s202
>>>>> 22    2                    osd.22    up    1
>>>>> 23    2                    osd.23    up    1
>>>>> 24    2                    osd.24    up    1
>>>>> 25    2                    osd.25    up    1
>>>>> 26    2                    osd.26    up    1
>>>>> 27    2                    osd.27    up    1
>>>>> 28    2                    osd.28    up    1
>>>>> 29    2                    osd.29    up    1
>>>>> 30    2                    osd.30    up    1
>>>>> 31    2                    osd.31    up    1
>>>>> 32    2                    osd.32    up    1
>>>>> -8    22            rack r22
>>>>> -2    22                host s201
>>>>> 0    2                    osd.0    up    1
>>>>> 1    2                    osd.1    up    1
>>>>> 2    2                    osd.2    up    1
>>>>> 3    2                    osd.3    up    1
>>>>> 4    2                    osd.4    up    1
>>>>> 5    2                    osd.5    up    1
>>>>> 6    2                    osd.6    up    1
>>>>> 7    2                    osd.7    up    1
>>>>> 8    2                    osd.8    up    1
>>>>> 9    2                    osd.9    up    1
>>>>> 10    2                    osd.10    up    1
>>>>> -14    21            rack r21
>>>>> -11    21                host s203
>>>>> 44    2                    osd.44    up    1
>>>>> 45    2                    osd.45    up    1
>>>>> 46    2                    osd.46    up    1
>>>>> 47    2                    osd.47    up    1
>>>>> 48    2                    osd.48    up    1
>>>>> 49    2                    osd.49    up    1
>>>>> 50    2                    osd.50    up    1
>>>>> 51    2                    osd.51    up    1
>>>>> 52    1.5                    osd.52    up    1
>>>>> 53    1.5                    osd.53    up    1
>>>>> 54    2                    osd.54    up    1
>>>>>
>>>>>
>>>>> 2013/4/21 Marco Aroldi <[email protected]>:
>>>>>> So, I've restarted the new osds as many as possible and the cluster
>>>>>> started to move data to the 2 new nodes overnight.
>>>>>> This morning there was not netowrk traffic and the healt was
>>>>>>
>>>>>> HEALTH_ERR 1323 pgs backfill; 150 pgs backfill_toofull; 100 pgs
>>>>>> backfilling; 114 pgs degraded; 3374 pgs peering; 36 pgs recovering;
>>>>>> 949 pgs recovery_wait; 3374 pgs stuck inactive; 6289 pgs stuck
>>>>>> unclean; recovery 2130652/20890113 degraded (10.199%); 58/8914654
>>>>>> unfound (0.001%); 1 full osd(s); 22 near full osd(s); full,noup,nodown
>>>>>> flag(s) set
>>>>>>
>>>>>> So I have unset the noup and nodown flags and the data started movin 
>>>>>> again
>>>>>> I've increased the full ratio to 97% so now there's no "official" full
>>>>>> osd and the HEALTH_ERR became HEALT_WARN
>>>>>>
>>>>>> However, still no access to filesystem
>>>>>>
>>>>>> HEALTH_WARN 1906 pgs backfill; 21 pgs backfill_toofull; 52 pgs
>>>>>> backfilling; 707 pgs degraded; 371 pgs down; 97 pgs incomplete; 3385
>>>>>> pgs peering; 35 pgs recovering; 1002 pgs recovery_wait; 4 pgs stale;
>>>>>> 683 pgs stuck inactive; 5898 pgs stuck unclean; recovery
>>>>>> 3081499/22208859 degraded (13.875%); 487/9433642 unfound (0.005%);
>>>>>> recovering 11722 o/s, 57040MB/s; 17 near full osd(s)
>>>>>>
>>>>>> The osd are flapping in/out again...
>>>>>>
>>>>>> I'm disposed to start deleting some portion of data.
>>>>>> What can I try to do now?
>>>>>>
>>>>>> 2013/4/21 Gregory Farnum <[email protected]>:
>>>>>>> It's not entirely clear from your description and the output you've
>>>>>>> given us, but it looks like maybe you've managed to bring up all your
>>>>>>> OSDs correctly at this point? Or are they just not reporting down
>>>>>>> because you set the "no down" flag...
>>>>>>>
>>>>>>> In any case, CephFS isn't going to come up while the underlying RADOS
>>>>>>> cluster is this unhealthy, so you're going to need to get that going
>>>>>>> again. Since your OSDs have managed to get themselves so full it's
>>>>>>> going to be trickier than normal, but if all the rebalancing that's
>>>>>>> happening is only because you sort-of-didn't-really lose nodes, and
>>>>>>> you can bring them all back up, you should be able to sort it out by
>>>>>>> getting all the nodes back up, and then changing your full percentages
>>>>>>> (by a *very small* amount); since you haven't been doing any writes to
>>>>>>> the cluster it shouldn't take much data writes to get everything back
>>>>>>> where it was, although if this has been continuing to backfill in the
>>>>>>> meanwhile that will need to unwind.
>>>>>>> -Greg
>>>>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Apr 20, 2013 at 12:21 PM, John Wilkins 
>>>>>>> <[email protected]> wrote:
>>>>>>>> I don't see anything related to lost objects in your output. I just see
>>>>>>>> waiting on backfill, backfill_toofull, remapped, and so forth. You can 
>>>>>>>> read
>>>>>>>> a bit about what is going on here:
>>>>>>>> http://ceph.com/docs/next/rados/operations/monitoring-osd-pg/
>>>>>>>>
>>>>>>>> Keep us posted as to the recovery, and let me know what I can do to 
>>>>>>>> improve
>>>>>>>> the docs for scenarios like this.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Apr 20, 2013 at 10:52 AM, Marco Aroldi <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> John,
>>>>>>>>> thanks for the quick reply.
>>>>>>>>> Below you can see my ceph osd tree
>>>>>>>>> The problem is caused not by the failure itself, but by the "renamed"
>>>>>>>>> bunch of devices.
>>>>>>>>> It was like a deadly 15-puzzle
>>>>>>>>> I think that the solution was to mount the devices in fstab using UUID
>>>>>>>>> (/dev/disk/by-uuid) instead of /dev/sdX
>>>>>>>>>
>>>>>>>>> However, yes I have an entry in my ceph.conf (devs = /dev/sdX1 --
>>>>>>>>> osd_journal = /dev/sdX2) *and* an entry in my fstab for each OSD
>>>>>>>>>
>>>>>>>>> The node with failed disk is s103 (osd.59)
>>>>>>>>>
>>>>>>>>> Now i have 5 osd from s203 up and in to try to let ceph rebalance
>>>>>>>>> data... but is still a bloody mess.
>>>>>>>>> Look at ceph -w output: is reported a total of 110TB: is wrong... al
>>>>>>>>> drives are 2TB and i have 49 drives up and in -- total 98Tb
>>>>>>>>> I think that 110TB (55 osd) was the size before cluster became
>>>>>>>>> inaccessible
>>>>>>>>>
>>>>>>>>> # id    weight    type name    up/down    reweight
>>>>>>>>> -1    130    root default
>>>>>>>>> -9    65        room p1
>>>>>>>>> -3    44            rack r14
>>>>>>>>> -4    22                host s101
>>>>>>>>> 11    2                    osd.11    up    1
>>>>>>>>> 12    2                    osd.12    up    1
>>>>>>>>> 13    2                    osd.13    up    1
>>>>>>>>> 14    2                    osd.14    up    1
>>>>>>>>> 15    2                    osd.15    up    1
>>>>>>>>> 16    2                    osd.16    up    1
>>>>>>>>> 17    2                    osd.17    up    1
>>>>>>>>> 18    2                    osd.18    up    1
>>>>>>>>> 19    2                    osd.19    up    1
>>>>>>>>> 20    2                    osd.20    up    1
>>>>>>>>> 21    2                    osd.21    up    1
>>>>>>>>> -6    22                host s102
>>>>>>>>> 33    2                    osd.33    up    1
>>>>>>>>> 34    2                    osd.34    up    1
>>>>>>>>> 35    2                    osd.35    up    1
>>>>>>>>> 36    2                    osd.36    up    1
>>>>>>>>> 37    2                    osd.37    up    1
>>>>>>>>> 38    2                    osd.38    up    1
>>>>>>>>> 39    2                    osd.39    up    1
>>>>>>>>> 40    2                    osd.40    up    1
>>>>>>>>> 41    2                    osd.41    up    1
>>>>>>>>> 42    2                    osd.42    up    1
>>>>>>>>> 43    2                    osd.43    up    1
>>>>>>>>> -13    21            rack r10
>>>>>>>>> -12    21                host s103
>>>>>>>>> 55    2                    osd.55    up    0
>>>>>>>>> 56    2                    osd.56    up    0
>>>>>>>>> 57    2                    osd.57    up    0
>>>>>>>>> 58    2                    osd.58    up    0
>>>>>>>>> 59    2                    osd.59    down    0
>>>>>>>>> 60    2                    osd.60    down    0
>>>>>>>>> 61    2                    osd.61    down    0
>>>>>>>>> 62    2                    osd.62    up    0
>>>>>>>>> 63    2                    osd.63    up    0
>>>>>>>>> 64    1.5                    osd.64    up    0
>>>>>>>>> 65    1.5                    osd.65    down    0
>>>>>>>>> -10    65        room p2
>>>>>>>>> -7    22            rack r20
>>>>>>>>> -5    22                host s202
>>>>>>>>> 22    2                    osd.22    up    1
>>>>>>>>> 23    2                    osd.23    up    1
>>>>>>>>> 24    2                    osd.24    up    1
>>>>>>>>> 25    2                    osd.25    up    1
>>>>>>>>> 26    2                    osd.26    up    1
>>>>>>>>> 27    2                    osd.27    up    1
>>>>>>>>> 28    2                    osd.28    up    1
>>>>>>>>> 29    2                    osd.29    up    1
>>>>>>>>> 30    2                    osd.30    up    1
>>>>>>>>> 31    2                    osd.31    up    1
>>>>>>>>> 32    2                    osd.32    up    1
>>>>>>>>> -8    22            rack r22
>>>>>>>>> -2    22                host s201
>>>>>>>>> 0    2                    osd.0    up    1
>>>>>>>>> 1    2                    osd.1    up    1
>>>>>>>>> 2    2                    osd.2    up    1
>>>>>>>>> 3    2                    osd.3    up    1
>>>>>>>>> 4    2                    osd.4    up    1
>>>>>>>>> 5    2                    osd.5    up    1
>>>>>>>>> 6    2                    osd.6    up    1
>>>>>>>>> 7    2                    osd.7    up    1
>>>>>>>>> 8    2                    osd.8    up    1
>>>>>>>>> 9    2                    osd.9    up    1
>>>>>>>>> 10    2                    osd.10    up    1
>>>>>>>>> -14    21            rack r21
>>>>>>>>> -11    21                host s203
>>>>>>>>> 44    2                    osd.44    up    1
>>>>>>>>> 45    2                    osd.45    up    1
>>>>>>>>> 46    2                    osd.46    up    1
>>>>>>>>> 47    2                    osd.47    up    1
>>>>>>>>> 48    2                    osd.48    up    1
>>>>>>>>> 49    2                    osd.49    up    0
>>>>>>>>> 50    2                    osd.50    up    0
>>>>>>>>> 51    2                    osd.51    up    0
>>>>>>>>> 52    1.5                    osd.52    up    0
>>>>>>>>> 53    1.5                    osd.53    up    0
>>>>>>>>> 54    2                    osd.54    up    0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ceph -w
>>>>>>>>>
>>>>>>>>> 2013-04-20 19:46:48.608988 mon.0 [INF] pgmap v1352767: 17280 pgs: 58
>>>>>>>>> active, 12581 active+clean, 1686 active+remapped+wait_backfill, 24
>>>>>>>>> active+degraded+wait_backfill, 224
>>>>>>>>> active+remapped+wait_backfill+backfill_toofull, 1061
>>>>>>>>> active+recovery_wait, 4
>>>>>>>>> active+degraded+wait_backfill+backfill_toofull, 629 peering, 626
>>>>>>>>> active+remapped, 72 active+remapped+backfilling, 89 active+degraded,
>>>>>>>>> 14 active+remapped+backfill_toofull, 1 active+clean+scrubbing, 8
>>>>>>>>> active+degraded+remapped+wait_backfill, 20
>>>>>>>>> active+recovery_wait+remapped, 5
>>>>>>>>> active+degraded+remapped+wait_backfill+backfill_toofull, 162
>>>>>>>>> remapped+peering, 1 active+degraded+remapped+backfilling, 2
>>>>>>>>> active+degraded+remapped+backfill_toofull, 13 active+recovering; 49777
>>>>>>>>> GB data, 72863 GB used, 40568 GB / 110 TB avail; 2965687/21848501
>>>>>>>>> degraded (13.574%);  recovering 5 o/s, 16363B/s
>>>>>>>>>
>>>>>>>>> 2013/4/20 John Wilkins <[email protected]>:
>>>>>>>>> > Marco,
>>>>>>>>> >
>>>>>>>>> > If you do a "ceph tree" can you see if your OSDs are all up? You 
>>>>>>>>> > seem to
>>>>>>>>> > have at least one problem related to the backfill OSDs being too 
>>>>>>>>> > full,
>>>>>>>>> > and
>>>>>>>>> > some which are near full or full for the purposes of storage. See 
>>>>>>>>> > the
>>>>>>>>> > following in the documentation to see if this helps:
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > http://ceph.com/docs/master/rados/configuration/mon-config-ref/#storage-capacity
>>>>>>>>> >
>>>>>>>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
>>>>>>>>> >
>>>>>>>>> > http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#no-free-drive-space
>>>>>>>>> >
>>>>>>>>> > Before you start deleting data as a remedy, you'd want to at least 
>>>>>>>>> > try
>>>>>>>>> > to
>>>>>>>>> > get the OSDs back up and running first.
>>>>>>>>> >
>>>>>>>>> > If rebooting changed the drive names, you might look here:
>>>>>>>>> >
>>>>>>>>> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#general-settings
>>>>>>>>> >
>>>>>>>>> > We have default settings for OSD and journal paths, which you could
>>>>>>>>> > override
>>>>>>>>> > if you can locate the data and journal sources on the renamed 
>>>>>>>>> > drives. If
>>>>>>>>> > you
>>>>>>>>> > mounted them, but didn't add them to the fstab, that might be the 
>>>>>>>>> > source
>>>>>>>>> > of
>>>>>>>>> > the problem. I'd rather see you use the default paths, as it would 
>>>>>>>>> > be
>>>>>>>>> > easier
>>>>>>>>> > to troubleshoot later. So did you mount the drives, but not add the
>>>>>>>>> > mount
>>>>>>>>> > points to fstab?
>>>>>>>>> >
>>>>>>>>> > John
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Sat, Apr 20, 2013 at 8:46 AM, Marco Aroldi 
>>>>>>>>> > <[email protected]>
>>>>>>>>> > wrote:
>>>>>>>>> >>
>>>>>>>>> >> Hi,
>>>>>>>>> >> due a harware failure during expanding ceph, I'm in big trouble
>>>>>>>>> >> because the cephfs doesn't mount anymore.
>>>>>>>>> >> I was adding a couple storage nodes, but a disk has failed and 
>>>>>>>>> >> after a
>>>>>>>>> >> reboot the OS (ubuntu 12.04) renamed the remaining devices, so the
>>>>>>>>> >> entire node has been screwed out.
>>>>>>>>> >>
>>>>>>>>> >> Now, from the "sane new node", I'm taking some new osd up and in
>>>>>>>>> >> because the cluster is near full and I can't revert completely the
>>>>>>>>> >> situation as before
>>>>>>>>> >>
>>>>>>>>> >> *I can* afford data loss, but i need to regain access to the 
>>>>>>>>> >> filesystem
>>>>>>>>> >>
>>>>>>>>> >> My setup:
>>>>>>>>> >> 3 mon + 3 mds
>>>>>>>>> >> 4 storage nodes (i was adding no. 5 and 6)
>>>>>>>>> >>
>>>>>>>>> >> Ceph 0.56.4
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> ceph health:
>>>>>>>>> >> HEALTH_ERR 2008 pgs backfill; 246 pgs backfill_toofull; 74 pgs
>>>>>>>>> >> backfilling; 134 pgs degraded; 790 pgs peering; 10 pgs recovering;
>>>>>>>>> >> 1116 pgs recovery_wait; 790 pgs stuck inactive; 4782 pgs stuck
>>>>>>>>> >> unclean; recovery 3049459/21926624 degraded (13.908%);  recovering 
>>>>>>>>> >> 6
>>>>>>>>> >> o/s, 16316KB/s; 4 full osd(s); 30 near full osd(s); 
>>>>>>>>> >> full,noup,nodown
>>>>>>>>> >> flag(s) set
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> ceph mds dump:
>>>>>>>>> >> dumped mdsmap epoch 44
>>>>>>>>> >> epoch    44
>>>>>>>>> >> flags    0
>>>>>>>>> >> created    2013-03-18 14:42:29.330548
>>>>>>>>> >> modified    2013-04-20 17:14:32.969332
>>>>>>>>> >> tableserver    0
>>>>>>>>> >> root    0
>>>>>>>>> >> session_timeout    60
>>>>>>>>> >> session_autoclose    300
>>>>>>>>> >> last_failure    43
>>>>>>>>> >> last_failure_osd_epoch    18160
>>>>>>>>> >> compat    compat={},rocompat={},incompat={1=base v0.20,2=client
>>>>>>>>> >> writeable ranges,3=default file layouts on dirs,4=dir inode in
>>>>>>>>> >> separate object}
>>>>>>>>> >> max_mds    1
>>>>>>>>> >> in    0
>>>>>>>>> >> up    {0=6376}
>>>>>>>>> >> failed
>>>>>>>>> >> stopped
>>>>>>>>> >> data_pools    [0]
>>>>>>>>> >> metadata_pool    1
>>>>>>>>> >> 6376:    192.168.21.11:6800/13457 'm1' mds.0.9 up:replay seq 1
>>>>>>>>> >> 5945:    192.168.21.13:6800/12999 'm3' mds.-1.0 up:standby seq 1
>>>>>>>>> >> 5963:    192.168.21.12:6800/22454 'm2' mds.-1.0 up:standby seq 1
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> ceph mon dump:
>>>>>>>>> >> epoch 1
>>>>>>>>> >> fsid d634f7b3-8a8a-4893-bdfb-a95ccca7fddd
>>>>>>>>> >> last_changed 2013-03-18 14:39:42.253923
>>>>>>>>> >> created 2013-03-18 14:39:42.253923
>>>>>>>>> >> 0: 192.168.21.11:6789/0 mon.m1
>>>>>>>>> >> 1: 192.168.21.12:6789/0 mon.m2
>>>>>>>>> >> 2: 192.168.21.13:6789/0 mon.m3
>>>>>>>>> >> _______________________________________________
>>>>>>>>> >> ceph-users mailing list
>>>>>>>>> >> [email protected]
>>>>>>>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > John Wilkins
>>>>>>>>> > Senior Technical Writer
>>>>>>>>> > Intank
>>>>>>>>> > [email protected]
>>>>>>>>> > (415) 425-9599
>>>>>>>>> > http://inktank.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> John Wilkins
>>>>>>>> Senior Technical Writer
>>>>>>>> Intank
>>>>>>>> [email protected]
>>>>>>>> (415) 425-9599
>>>>>>>> http://inktank.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> ceph-users mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>>>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs unaccessible

Reply via email to