Thanks! I tried restarting osd.11 (the primary osd for the incomplete pg) and
that helped a LOT. We went from 0/1 op/s to 10-800+ op/s!
We still have "HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck
unclean", but at least we can
use our cluster :-)
ceph pg dump_stuck inactive
ok
pg_stat objects mip degr unf bytes log disklog state
state_stamp v reported up acting last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
2.1f6 118 0 0 0 403118080 0 0
incomplete 2013-07-30 06:08:18.883179 11127'11658123 12914'1506
[11,9] [11,9] 10321'11641837 2013-07-28 00:59:09.552640 10321'11641837
Thanks again!
Jeff
On Tue, Jul 30, 2013 at 11:44:58AM +0200, Jens Kristian S?gaard wrote:
> Hi,
>
>> This is the same issue as yesterday, but I'm still searching for a
>> solution. We have a lot of data on the cluster that we need and can't
>> health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs
>
> I'm not claiming to have an answer, but I have a suggestion you can try.
>
> Try running "ceph pg dump" to list all the pgs. Grep for ones that are
> inactive / incomplete. Note which osds they are on - it is listed in the
> square brackets with the primary being the first in the list.
>
> Now try restarting the primary osd for the stuck pg and see if that
> could possible shift things into place.
>
> --
> Jens Kristian S?gaard, Mermaid Consulting ApS,
> [email protected],
> http://www.mermaidconsulting.com/
--
===============================================================================
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com