Hi Sam,

Thank you for your precise inspection.

I reviewed the log at the time, and I discovered that the cluster failed a OSD 
just after I shut the first unit down. Thus as you said, the pg can't finish 
peering due to the second unit was then shut off suddenly.

Much appreciate your advice, but I aim to keep my cluster working when 2 
storage nodes are down. The unexpected OSD failed with the following log just 
at the time I shut the first unit down:

2017-01-10 12:30:07.905562 mon.1 172.20.1.3:6789/0 28484 : cluster [INF] 
osd.153 172.20.3.2:6810/26796 failed (2 reporters from different host after 
20.072026>= grace 20.000000)

But that OSD was not dead actually, more likely had slow response to 
heartbeats. What I think is increasing the osd_heartbeat_grace may somehow 
mitigate the issue.

Sincerely,
Craig Chi

On 2017-01-11 00:08, Samuel Just<[email protected]>wrote:
> { "name": "Started\/Primary\/Peering", "enter_time": "2017-01-10 
> 13:43:34.933074", "past_intervals": [ { "first": 75858, "last": 75860, 
> "maybe_went_rw": 1, "up": [ 345, 622, 685, 183, 792, 2147483647, 2147483647, 
> 401, 516 ], "acting": [ 345, 622, 685, 183, 792, 2147483647, 2147483647, 401, 
> 516 ], "primary": 345, "up_primary":345 }, Between 75858 and 75860, 345, 622, 
> 685, 183, 792, 2147483647, 2147483647, 401, 516 was the acting set. The 
> current acting set 345, 622, 685, 183, 2147483647, 2147483647, 153, 401, 516 
> needs *all 7* of the osds from epochs 75858 through 75860 to ensure that it 
> has any writes completed during that time. You can make transient situations 
> like that less of a problem by setting min_size to 8 (though it'll prevent 
> writes with 2 failures until backfill completes). A possible enhancement for 
> an EC pool would be to gather the infos from those osds anyway and use that 
> rule outwrites (if they actually happened, you'd still be stuck). -Sam On 
> Tue, Jan 10, 20
 17 at 5:

36 AM, Craig Chi<[email protected]>wrote:>Hi List,>>I am testing the 
stability of my Ceph cluster with power failure.>>I brutally powered off 2 Ceph 
units with each 90 OSDs on it while the client>I/O was continuing.>>Since then, 
some of the pgs of my cluster stucked in peering>>pgmap v3261136: 17408 pgs, 4 
pools, 176 TB data, 5082 kobjects>236 TB used, 5652 TB / 5889 TB 
avail>8563455/38919024 objects degraded (22.003%)>13526 
active+undersized+degraded>3769 active+clean>104 down+remapped+peering>9 
down+peering>>I queried the peering pg (all on EC pool with 7+2) and got 
blocked>information (full query: http://pastebin.com/pRkaMG2h 
)>>"probing_osds": 
[>"153(6)",>"183(3)",>"345(0)",>"401(7)",>"516(8)",>"622(1)",>"685(2)">],>"blocked":
 "peering is blocked due to down osds",>"down_osds_we_would_probe": 
[>792>],>"peering_blocked_by": [>{>"osd": 792,>"current_lost_at": 0,>"comment": 
"starting or marking this osd lost may let us>proceed">}>]>>>osd.792 is exactly 
on one of the unit
 s I powe

red off. And I think the I/O>associated with this pg is paused too.>>I have 
checked the troubleshooting page on Ceph website 
(>http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/>), 
it says that start the OSD or mark it lost can make the procedure>continue.>>I 
am sure that my cluster was healthy before the power outage occurred. I 
am>wondering if the power outage really happens in production environment, 
will>it also freeze my client I/O if I don't do anything? Since I just lost 
2>redundancies (I have erasure code with 7+2), I think it should still 
serve>normal functionality.>>Or if I am doing something wrong? Please give me 
some suggestions, thanks.>>Sincerely,>Craig 
Chi>>_______________________________________________>ceph-users mailing 
list>[email protected]>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to