Hello Guys

My CEPH cluster lost data and not its not recovering. This problem occurred 
when Ceph performed recovery when one of the node was down. 
Now all the nodes are up but Ceph is showing PG as incomplete , unclean , 
recovering.


I have tried several things to recover them like , scrub , deep-scrub , pg 
repair , try changing primary affinity and then scrubbing , 
osd_pool_default_size etc. BUT NO LUCK

Could yo please advice , how to recover PG and achieve HEALTH_OK

# ceph -s
    cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
     health HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs stuck 
inactive; 23 pgs stuck unclean; 2 requests are blocked > 32 sec; recovery 
531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
     monmap e3: 3 mons at 
{xxx=xxxx:6789/0,xxx=xxxx:6789:6789/0,xxx=xxxx:6789:6789/0}, election epoch 
1474, quorum 0,1,2 xx,xx,xx
     osdmap e261536: 239 osds: 239 up, 238 in
      pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects
            20316 GB used, 844 TB / 864 TB avail
            531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
                   1 creating
               18409 active+clean
                   3 active+recovering
                  19 incomplete




# ceph pg dump_stuck unclean
ok
pg_stat objects mip     degr    unf     bytes   log     disklog state   
state_stamp     v       reported        up      up_primary      acting  
acting_primary  last_scrub      scrub_stamp     last_deep_scrub deep_scrub_stamp
10.70   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.534911      0'0     261536:1015     [153,140,80]    153     
[153,140,80]    153     0'0     2015-03-12 17:59:43.275049      0'0     
2015-03-09 17:55:58.745662
3.dde   68      66      0       66      552861709       297     297     
incomplete      2015-03-20 12:19:49.584839      33547'297       261536:228352   
[174,5,179]     174     [174,5,179]     174     33547'297       2015-03-12 
14:19:15.261595      28522'43        2015-03-11 14:19:13.894538
5.a2    0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.560756      0'0     261536:897      [214,191,170]   214     
[214,191,170]   214     0'0     2015-03-12 17:58:29.257085      0'0     
2015-03-09 17:55:07.684377
13.1b6  0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.846253      0'0     261536:1050     [0,176,131]     0       
[0,176,131]     0       0'0     2015-03-12 18:00:13.286920      0'0     
2015-03-09 17:56:18.715208
7.25b   16      0       0       0       67108864        16      16      
incomplete      2015-03-20 12:19:49.639102      27666'16        261536:4777     
[194,145,45]    194     [194,145,45]    194     27666'16        2015-03-12 
17:59:06.357864      2330'3  2015-03-09 17:55:30.754522
5.19    0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.742698      0'0     261536:25410    [212,43,131]    212     
[212,43,131]    212     0'0     2015-03-12 13:51:37.777026      0'0     
2015-03-11 13:51:35.406246
3.a2f   0       0       0       0       0       0       0       creating        
2015-03-20 12:42:15.586372      0'0     0:0     []      -1      []      -1      
0'0     0.000000        0'0     0.000000
7.298   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.566966      0'0     261536:900      [187,95,225]    187     
[187,95,225]    187     27666'13        2015-03-12 17:59:10.308423      2330'4  
2015-03-09 17:55:35.750109
3.a5a   77      87      261     87      623902741       325     325     
active+recovering       2015-03-20 10:54:57.443670      33569'325       
261536:182464   [150,149,181]   150     [150,149,181]   150     33569'325       
2015-03-12 13:58:05.813966      28433'44        2015-03-11 13:57:53.909795
1.1e7   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.610547      0'0     261536:772      [175,182]       175     
[175,182]       175     0'0     2015-03-12 17:55:45.203232      0'0     
2015-03-09 17:53:49.694822
3.774   79      0       0       0       645136397       339     339     
incomplete      2015-03-20 12:19:49.821708      33570'339       261536:166857   
[162,39,161]    162     [162,39,161]    162     33570'339       2015-03-12 
14:49:03.869447      2226'2  2015-03-09 13:46:49.783950
3.7d0   78      0       0       0       609222686       376     376     
incomplete      2015-03-20 12:19:49.534004      33538'376       261536:182810   
[117,118,177]   117     [117,118,177]   117     33538'376       2015-03-12 
13:51:03.984454      28394'62        2015-03-11 13:50:58.196288
3.d60   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.647196      0'0     261536:833      [154,172,1]     154     
[154,172,1]     154     33552'321       2015-03-12 13:44:43.502907      
28356'39        2015-03-11 13:44:41.663482
4.1fc   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.610103      0'0     261536:1069     [70,179,58]     70      
[70,179,58]     70      0'0     2015-03-12 17:58:19.254170      0'0     
2015-03-09 17:54:55.720479
3.e02   72      0       0       0       585105425       304     304     
incomplete      2015-03-20 12:19:49.564768      33568'304       261536:167428   
[15,102,147]    15      [15,102,147]    15      33568'304       2015-03-16 
10:04:19.894789      2246'4  2015-03-09 11:43:44.176331
8.1d4   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.614727      0'0     261536:19611    [126,43,174]    126     
[126,43,174]    126     0'0     2015-03-12 14:34:35.258338      0'0     
2015-03-12 14:34:35.258338
4.2f4   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.595109      0'0     261536:113791   [181,186,13]    181     
[181,186,13]    181     0'0     2015-03-12 14:59:03.529264      0'0     
2015-03-09 13:46:40.601301
3.52c   65      23      69      23      543162368       290     290     
active+recovering       2015-03-20 10:51:43.664734      33553'290       
261536:8431     [212,100,219]   212     [212,100,219]   212     33553'290       
2015-03-13 11:44:26.396514      29686'103       2015-03-11 17:18:33.452616
3.e5a   76      70      0       0       623902741       325     325     
incomplete      2015-03-20 12:19:49.552071      33569'325       261536:71248    
[97,22,62]      97      [97,22,62]      97      33569'325       2015-03-12 
13:58:05.813966      28433'44        2015-03-11 13:57:53.909795
8.3a0   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.615728      0'0     261536:173184   [62,14,178]     62      
[62,14,178]     62      0'0     2015-03-12 13:52:44.546418      0'0     
2015-03-12 13:52:44.546418
3.24e   0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.591282      0'0     261536:1026     [103,14,90]     103     
[103,14,90]     103     33556'272       2015-03-13 11:44:41.263725      2327'4  
2015-03-09 17:54:43.675552
5.f7    0       0       0       0       0       0       0       incomplete      
2015-03-20 12:19:49.667823      0'0     261536:853      [73,44,123]     73      
[73,44,123]     73      0'0     2015-03-12 17:58:30.257371      0'0     
2015-03-09 17:55:11.725629
3.ae8   77      67      201     67      624427024       342     342     
active+recovering       2015-03-20 10:50:01.693979      33516'342       
261536:149258   [122,144,218]   122     [122,144,218]   122     33516'342       
2015-03-12 17:11:01.899062      29638'134       2015-03-11 17:10:59.966372
#


PG data is there on multiple OSD’s but Ceph is not recovering the PG , For 
Example

# ceph pg map 7.25b
osdmap e261536 pg 7.25b (7.25b) -> up [194,145,45] acting [194,145,45]


# ls -l /var/lib/ceph/osd/ceph-194/current/7.25b_head | wc -l
17

# ls -l /var/lib/ceph/osd/ceph-145/current/7.25b_head | wc -l
0
#

# ls -l /var/lib/ceph/osd/ceph-45/current/7.25b_head | wc -l
17





Some of the PG are completely lost , i.e they don’t have any data . For example 

# ceph pg map 10.70
osdmap e261536 pg 10.70 (10.70) -> up [153,140,80] acting [153,140,80]


# ls -l /var/lib/ceph/osd/ceph-140/current/10.70_head | wc -l
0

# ls -l /var/lib/ceph/osd/ceph-153/current/10.70_head | wc -l
0

# ls -l /var/lib/ceph/osd/ceph-80/current/10.70_head | wc -l
0



- Karan -


Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to