Hi Craig,

Sorry for late response. Somehow missed this mail.
All osds are up and running. There were no specific logs related to this 
activity.  And, there are no IOs running right now. Few osds were made in and 
out ,removed fully and recreated before these pgs coming to this stage.
I had tried restarting osds. It didn’t work.

Thanks
Sahana Lokeshappa
Test Development Engineer I
SanDisk Corporation
3rd Floor, Bagmane Laurel, Bagmane Tech Park
C V Raman nagar, Bangalore 560093
T: +918042422283
[email protected]

From: Craig Lewis [mailto:[email protected]]
Sent: Wednesday, September 24, 2014 5:44 AM
To: Sahana Lokeshappa
Cc: [email protected]
Subject: Re: [ceph-users] [Ceph-community] Pgs are in stale+down+peering state

Is osd.12  doing anything strange?  Is it consuming lots of CPU or IO?  Is it 
flapping?   Writing any interesting logs?  Have you tried restarting it?

If that doesn't help, try the other involved osds: 56, 27, 6, 25, 23.  I doubt 
that it will help, but it won't hurt.



On Mon, Sep 22, 2014 at 11:21 AM, Varada Kari 
<[email protected]<mailto:[email protected]>> wrote:
Hi Sage,

To give more context on this problem,

This cluster has two pools rbd and user-created.

Osd.12 is a primary for some other PG’s , but the problem happens for these 
three  PG’s.

$ sudo ceph osd lspools
0 rbd,2 pool1,

$ sudo ceph -s
    cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
     health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck 
inactive; 3 pgs stuck stale; 3 pgs stuck unclean; 1 requests are blocked > 32 
sec
    monmap e1: 3 mons at 
{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0<http://10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0>},
 election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
     osdmap e17842: 64 osds: 64 up, 64 in
      pgmap v79729: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
            12504 GB used, 10971 GB / 23476 GB avail
                2145 active+clean
                   3 stale+down+peering

Snippet from pg dump:

2.a9    518     0       0       0       0       2172649472      3001    3001    
active+clean    2014-09-22 17:49:35.357586      6826'35762      17842:72706     
[12,7,28]       12      [12,7,28]   12       6826'35762      2014-09-22 
11:33:55.985449      0'0     2014-09-16 20:11:32.693864
0.59    0       0       0       0       0       0       0       0       
active+clean    2014-09-22 17:50:00.751218      0'0     17842:4472      
[12,41,2]       12      [12,41,2]       12      0'0 2014-09-22 16:47:09.315499  
     0'0     2014-09-16 12:20:48.618726
0.4d    0       0       0       0       0       0       4       4       
stale+down+peering      2014-09-18 17:51:10.038247      186'4   11134:498       
[12,56,27]      12      [12,56,27]      12  186'4    2014-09-18 17:30:32.393188 
     0'0     2014-09-16 12:20:48.615322
0.49    0       0       0       0       0       0       0       0       
stale+down+peering      2014-09-18 17:44:52.681513      0'0     11134:498       
[12,6,25]       12      [12,6,25]       12  0'0      2014-09-18 17:16:12.986658 
     0'0     2014-09-16 12:20:48.614192
0.1c    0       0       0       0       0       0       12      12      
stale+down+peering      2014-09-18 17:51:16.735549      186'12  11134:522       
[12,25,23]      12      [12,25,23]      12  186'12   2014-09-18 17:16:04.457863 
     186'10  2014-09-16 14:23:58.731465
2.17    510     0       0       0       0       2139095040      3001    3001    
active+clean    2014-09-22 17:52:20.364754      6784'30742      17842:72033     
[12,27,23]      12      [12,27,23]  12       6784'30742      2014-09-22 
00:19:39.905291      0'0     2014-09-16 20:11:17.016299
2.7e8   508     0       0       0       0       2130706432      3433    3433    
active+clean    2014-09-22 17:52:20.365083      6702'21132      17842:64769     
[12,25,23]      12      [12,25,23]  12       6702'21132      2014-09-22 
17:01:20.546126      0'0     2014-09-16 14:42:32.079187
2.6a5   528     0       0       0       0       2214592512      2840    2840    
active+clean    2014-09-22 22:50:38.092084      6775'34416      17842:83221     
[12,58,0]       12      [12,58,0]   12       6775'34416      2014-09-22 
22:50:38.091989      0'0     2014-09-16 20:11:32.703368

And we couldn’t observe and peering events happening on the primary osd.

$ sudo ceph pg 0.49 query
Error ENOENT: i don't have pgid 0.49
$ sudo ceph pg 0.4d query
Error ENOENT: i don't have pgid 0.4d
$ sudo ceph pg 0.1c query
Error ENOENT: i don't have pgid 0.1c

Not able to explain why the peering was stuck. BTW, Rbd pool doesn’t contain 
any data.

Varada

From: Ceph-community 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Sage Weil
Sent: Monday, September 22, 2014 10:44 PM
To: Sahana Lokeshappa; 
[email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: Re: [Ceph-community] Pgs are in stale+down+peering state


Stale means that the primary OSD for the PG went down and the status is stale.  
They all seem to be from OSD.12... Seems like something is preventing that OSD 
from reporting to the mon?

sage

On September 22, 2014 7:51:48 AM EDT, Sahana Lokeshappa 
<[email protected]<mailto:[email protected]>> wrote:
Hi all,


I used command  ‘ceph osd thrash ‘ command and after all osds are up and in, 3  
pgs are in  stale+down+peering state


sudo ceph -s
    cluster 99ffc4a5-2811-4547-bd65-34c7d4c58758
     health HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck 
inactive; 3 pgs stuck stale; 3 pgs stuck unclean
     monmap e1: 3 mons at 
{rack2-ram-1=10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0<http://10.242.42.180:6789/0,rack2-ram-2=10.242.42.184:6789/0,rack2-ram-3=10.242.42.188:6789/0>},
 election epoch 2008, quorum 0,1,2 rack2-ram-1,rack2-ram-2,rack2-ram-3
     osdmap e17031: 64 osds: 64 up, 64 in
      pgmap v76728: 2148 pgs, 2 pools, 4135 GB data, 1033 kobjects
            12501 GB used, 10975 GB / 23476 GB avail
                2145 active+clean
                   3 stale+down+peering


sudo ceph health detail
HEALTH_WARN 3 pgs down; 3 pgs peering; 3 pgs stale; 3 pgs stuck inactive; 3 pgs 
stuck stale; 3 pgs stuck unclean
pg 0.4d is stuck inactive for 341048.948643, current state stale+down+peering, 
last acting [12,56,27]
pg 0.49 is stuck inactive for 341048.948667, current state stale+down+peering, 
last acting [12,6,25]
pg 0.1c is stuck inactive for 341048.949362, current state stale+down+peering, 
last acting [12,25,23]
pg 0.4d is stuck unclean for 341048.948665, current state stale+down+peering, 
last acting [12,56,27]
pg 0.49 is stuck unclean for 341048.948687, current state stale+down+peering, 
last acting [12,6,25]
pg 0.1c is stuck unclean for 341048.949382, current state stale+down+peering, 
last acting [12,25,23]
pg 0.4d is stuck stale for 339823.956929, current state stale+down+peering, 
last acting [12,56,27]
pg 0.49 is stuck stale for 339823.956930, current state stale+down+peering, 
last acting [12,6,25]
pg 0.1c is stuck stale for 339823.956925, current state stale+down+peering, 
last acting [12,25,23]




Please, can anyone explain why pgs are in this state.
Sahana Lokeshappa
Test Development Engineer I
SanDisk Corporation
3rd Floor, Bagmane Laurel, Bagmane Tech Park
C V Raman nagar, Bangalore 560093
T: +918042422283
[email protected]<mailto:[email protected]>



________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

________________________________

Ceph-community mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-community-ceph.com

--
Sent from Kaiten Mail. Please excuse my brevity.

_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to