On Oct 22, 2014, at 7:51 PM, Craig Lewis wrote:
> On Wed, Oct 22, 2014 at 3:09 PM, Chris Kitzmiller <[email protected]>
> wrote:
>> On Oct 22, 2014, at 1:50 PM, Craig Lewis wrote:
>>> Incomplete means "Ceph detects that a placement group is missing a
>>> necessary period of history from its log. If you see this state, report a
>>> bug, and try to start any failed OSDs that may contain the needed
>>> information".
>>>
>>> In the PG query, it lists some OSDs that it's trying to probe:
>>> "probing_osds": [
>>> "10",
>>> "13",
>>> "15",
>>> "25"],
>>> "down_osds_we_would_probe": [],
>>>
>>> Is one of those the OSD you replaced? If so, you might try ceph pg {pg-id}
>>> mark_unfound_lost revert|delete. That command will lose data; it tells
>>> Ceph to give up looking for data that it can't find, so you might want to
>>> wait a bit.
>>
>> Yes. osd.10 was the OSD I replaced. :( I suspect that I didn't actually have
>> any writes during this time and that a revert might leave me in an OK place.
>>
>> Looking at the query more closely I see that all of the peers (except
>> osd.10) have the same value for
>> last_update/last_complete/last_scrub/last_deep_scrub except that the peer
>> entry on osd.10 has 0 values for everything. It's as if all my OSDs are
>> believing in the ghost of this PG on osd.10. I'd like to revert I just want
>> to make sure that I'm going to revert to the sane value and not the 0 value.
>
> I've never (successfully) used mark_unfound_lost, so I can't say exactly
> what'll happen. revert should be what you need, but I don't know if it's
> going to revert to the point in time before whatever hole in the history
> happened, or if it will just give up on the portions of history that it
> doesn't have.
Huh. So I tried `ceph pg 3.222 mark_unfound_lost revert` and it told me "pg has
no unfound objects" and indeed: "num_objects_unfound": 0,
On one of the peers, osd.25 (which isn't in the acting set now and was up+in
the whole time) it reports:
"stat_sum": { "num_bytes": 7080120320,
"num_objects": 1697,
"num_object_clones": 0,
"num_object_copies": 3394,
"num_objects_missing_on_primary": 0,
"num_objects_degraded": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 1697,
"num_whiteouts": 0,
"num_read": 72828,
"num_read_kb": 8794722,
"num_write": 32405,
"num_write_kb": 11424120,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 1687,
"num_bytes_recovered": 7038177280,
"num_keys_recovered": 0,
"num_objects_omap": 0,
"num_objects_hit_set_archive": 0},
So, is it the 10 objects which are dirty but not recovered which are giving me
trouble? What can be done to correct these PGs?
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com