When exactly is the timeline of when the io error happened? If the primary
osd was dead, but not marked down in the cluster yet, then the cluster
would sit there and expect that osd too respond. If this definitely
happened after the primary osd was marked down, then it's a different story.

I'm confused about you saying 1 osd was down/out and 2 other osds we're
down but not out. We're this in the same host whole you were replacing the
disk? Is your failure domain host or osd? What version of ceph are you
running?

On Wed, Aug 9, 2017, 7:32 AM Peter Gervai <grin...@gmail.com> wrote:

> Hello,
>
> ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
>
> We had a few problems related to the simple operation of replacing a
> failed OSD, and some clarification would be appreciated. It is not
> very simple to observe what specifically happened (the timeline was
> gathered from half a dozen logs), so apologies for any vagueness,
> which could be fixed by looking at further specific logs on request.
>
> The main problem was that we had to replace a failed OSD. There were 2
> down+out but otherwise known (not deleted) OSDs. We have removed
> (deleted) one. It changes the CRUSH, and rebalancing starts (no matter
> that noout set since it's been out anyway; it could only be stopped by
> the scary norecover, but it's not been flagged then; I will check
> nobackfill/norebalance next time which looks more safe). Rebalancing
> finished fine (25% objects were told to be misplaced, which is a PITA,
> but there's been not much objects on that cluster). This is the
> prologue, so far it's all fine.
>
> We plugged in (and created) the new OSD, but due to the environment
> and some admin errors [wasn't me! :)] the OSD at start were not able
> to umount it's temporary filesystem which seems to be used for initial
> creation, so what I have observed is [from the logs]
>
> - 14:12:00, osd6 created, enters the osdmap, down+out
> - 14:12:02, replaced osd6 started, boots, tries to create initial osd
> layout
> - 14:12:03, osd6 crash due to failed umount / file not found
> - 14:12:07, some other osds are logging warnings like (may not be
> important):
>    misdirected client (some that osd not in the set, others just logged
> the pg)
> - 14:12:07, one of the clients get IO error (this one was actually
> pretty fatal):
>    rbd: rbd1: write 1000 at 40779000 (379000)
>    rbd: rbd1:   result -6 xferred 1000
>    blk_update_request: I/O error, dev rbd1, sector 2112456
>    EXT4-fs warning (device rbd1): ext4_end_bio:329: I/O error -6
> writing to inode 399502 (offset 0 size 0 starting block 264058)
>   Buffer I/O error on device rbd1, logical block 264057
> - 14:12:17, other client gets IO error (this one's been lucky):
>    rbd: rbd1: write 1000 at c84795000 (395000)
>    rbd: rbd1:   result -6 xferred 1000
>    blk_update_request: I/O error, dev rbd1, sector 105004200
> - 14:12:27, libceph: osd6 weight 0x10000 (in); in+down: the osd6 is
> crashed at that point and hasn't been restarted yet
>
> - 14:13:19, osd6 started again
> - 14:13:22, libceph: osd6 up
> - from this on everything's fine, apart from the crashed VM :-/
>
> The main problem is of course that the IO error which have reached the
> client, and knocked out the FS, while there were 2 replica osds
> active. I haven't found the specifics about how it's handled when the
> primary fails, or acts funky, since by my guess this may have been
> happened.
>
> I would like to understand why the IO error was there, and how to
> prevent it, if it's possible, and whether this is something which
> already have been taken care of in later ceph versions.
>
> Your shared wisdom would be appreciated.
>
> Thanks,
> Peter
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to