ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)

We had a few problems related to the simple operation of replacing a
failed OSD, and some clarification would be appreciated. It is not
very simple to observe what specifically happened (the timeline was
gathered from half a dozen logs), so apologies for any vagueness,
which could be fixed by looking at further specific logs on request.

The main problem was that we had to replace a failed OSD. There were 2
down+out but otherwise known (not deleted) OSDs. We have removed
(deleted) one. It changes the CRUSH, and rebalancing starts (no matter
that noout set since it's been out anyway; it could only be stopped by
the scary norecover, but it's not been flagged then; I will check
nobackfill/norebalance next time which looks more safe). Rebalancing
finished fine (25% objects were told to be misplaced, which is a PITA,
but there's been not much objects on that cluster). This is the
prologue, so far it's all fine.

We plugged in (and created) the new OSD, but due to the environment
and some admin errors [wasn't me! :)] the OSD at start were not able
to umount it's temporary filesystem which seems to be used for initial
creation, so what I have observed is [from the logs]

- 14:12:00, osd6 created, enters the osdmap, down+out
- 14:12:02, replaced osd6 started, boots, tries to create initial osd layout
- 14:12:03, osd6 crash due to failed umount / file not found
- 14:12:07, some other osds are logging warnings like (may not be important):
   misdirected client (some that osd not in the set, others just logged the pg)
- 14:12:07, one of the clients get IO error (this one was actually
pretty fatal):
   rbd: rbd1: write 1000 at 40779000 (379000)
   rbd: rbd1:   result -6 xferred 1000
   blk_update_request: I/O error, dev rbd1, sector 2112456
   EXT4-fs warning (device rbd1): ext4_end_bio:329: I/O error -6
writing to inode 399502 (offset 0 size 0 starting block 264058)
  Buffer I/O error on device rbd1, logical block 264057
- 14:12:17, other client gets IO error (this one's been lucky):
   rbd: rbd1: write 1000 at c84795000 (395000)
   rbd: rbd1:   result -6 xferred 1000
   blk_update_request: I/O error, dev rbd1, sector 105004200
- 14:12:27, libceph: osd6 weight 0x10000 (in); in+down: the osd6 is
crashed at that point and hasn't been restarted yet

- 14:13:19, osd6 started again
- 14:13:22, libceph: osd6 up
- from this on everything's fine, apart from the crashed VM :-/

The main problem is of course that the IO error which have reached the
client, and knocked out the FS, while there were 2 replica osds
active. I haven't found the specifics about how it's handled when the
primary fails, or acts funky, since by my guess this may have been

I would like to understand why the IO error was there, and how to
prevent it, if it's possible, and whether this is something which
already have been taken care of in later ceph versions.

Your shared wisdom would be appreciated.

ceph-users mailing list

Reply via email to