Re: [ceph-users] Bug in OSD Maps

Gregory Farnum Fri, 26 May 2017 14:59:26 -0700

On Fri, May 26, 2017 at 3:05 AM Stuart Harland <
s.harl...@livelinktechnology.net> wrote:


> Could you elaborate about what constitutes deleting the PG in this
> instance, is a simple `rm` of the directories with the PG number in current
> sufficient? or does it need some poking of anything else?
>

No, you need to look at how to use the ceph-objectstore tool. Just removing
the directories will leave associated metadata behind in leveldb/rocksdb.


>
> It is conceivable that there is a fault with the disks, they are known to
> be ‘faulty’ in the general sense that they suffer a cliff-edge Perf issue,
> however I’m somewhat confused about why this would suddenly happen in the
> way it has been detected.
>

Yeah, not sure. It might just be that the restarting is newly exposing old
issues, but I don't see how. I gather from skimming that ticket that it was
a disk state bug earlier on that was going undetected until Jewel, which is
why I was wondering about the upgrades.
-Greg


>
> We are past early life failures, most of these disks don’t appear to have
> any significant issues in their smart data to indicate that any write
> failures are occurring, and I haven’t seen this error once until a couple
> of weeks ago (we’ve been operating this cluster over 2 years now).
>
> The only versions I’m seeing running (just double checked) currently are
> 10.2.5,6 and 7. There was one node that had hammer running on it a while
> back, but it’s been running jewel for months now, so I doubt it’s related
> to that.
>
>
>
> On 26 May 2017, at 00:22, Gregory Farnum <gfar...@redhat.com> wrote:
>
>
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bug in OSD Maps

Reply via email to