This looks new to me. Can you try and start up the OSD with "debug osd
= 20" and "debug filestore = 20" in your conf, then put the log
somewhere accessible? (You can also use ceph-post-file if it's too
large for pastebin or something.)
Also, check dmesg and see if btrfs is complaining, and see what the
(folder, or more specifically snapshot) contents of the OSD data
directory are.

Since you *are* on btrfs this is probably reasonably recoverable, but
we'll have to see what's going on first.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Tue, Aug 26, 2014 at 10:18 PM, John Morris <j...@zultron.com> wrote:
> During reorganization of the Ceph system, including an updated CRUSH
> map and moving to btrfs, some PGs became stuck incomplete+remapped.
> Before that was resolved, a restart of osd.1 failed while creating a
> btrfs snapshot.  A 'ceph-osd -i 1 --flush-journal' fails with the same
> error.  See the below pasted log.
>
> This is a Bad Thing, because two PGs are now stuck down+peering.  A
> 'ceph pg 2.74 query' shows they had been stuck on osd.1 before the
> btrfs problem, despite what the 'last acting' field shows in the below
> 'ceph health detail' output.
>
> Is there any way to recover from this?  Judging from Google searches
> on the list archives, nobody has run into this problem before, so I'm
> quite worried that this spells backup recovery exercises for the next
> few days.
>
> Related question:  Are outright OSD crashes the reason btrfs is
> discouraged for production use?
>
> Thanks-
>
>         John
>
>
>
> pg 2.74 is stuck inactive since forever, current state down+peering, last 
> acting [3,7,0,6]
> pg 3.73 is stuck inactive since forever, current state down+peering, last 
> acting [3,7,0,6]
> pg 2.74 is stuck unclean since forever, current state down+peering, last 
> acting [3,7,0,6]
> pg 3.73 is stuck unclean since forever, current state down+peering, last 
> acting [3,7,0,6]
> pg 2.74 is down+peering, acting [3,7,0,6]
> pg 3.73 is down+peering, acting [3,7,0,6]
>
>
> 2014-08-26 22:36:12.641585 7f5b38e507a0  0 ceph version 0.67.10 
> (9d446bd416c52cd785ccf048ca67737ceafcdd7f), process ceph-osd, pid 10281
> 2014-08-26 22:36:12.717100 7f5b38e507a0  0 filestore(/ceph/osd.1) mount 
> FIEMAP ioctl is supported and appears to work
> 2014-08-26 22:36:12.717121 7f5b38e507a0  0 filestore(/ceph/osd.1) mount 
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2014-08-26 22:36:12.717434 7f5b38e507a0  0 filestore(/ceph/osd.1) mount 
> detected btrfs
> 2014-08-26 22:36:12.717471 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
> CLONE_RANGE ioctl is supported
> 2014-08-26 22:36:12.765009 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
> SNAP_CREATE is supported
> 2014-08-26 22:36:12.765335 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
> SNAP_DESTROY is supported
> 2014-08-26 22:36:12.765541 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
> START_SYNC is supported (transid 3118)
> 2014-08-26 22:36:12.789600 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
> WAIT_SYNC is supported
> 2014-08-26 22:36:12.808287 7f5b38e507a0  0 filestore(/ceph/osd.1) mount btrfs 
> SNAP_CREATE_V2 is supported
> 2014-08-26 22:36:12.834144 7f5b38e507a0  0 filestore(/ceph/osd.1) mount 
> syscall(SYS_syncfs, fd) fully supported
> 2014-08-26 22:36:12.834377 7f5b38e507a0  0 filestore(/ceph/osd.1) mount found 
> snaps <6009082,6009083>
> 2014-08-26 22:36:12.834427 7f5b38e507a0 -1 filestore(/ceph/osd.1) 
> FileStore::mount: error removing old current subvol: (22) Invalid argument
> 2014-08-26 22:36:12.861045 7f5b38e507a0 -1 filestore(/ceph/osd.1) mount 
> initial op seq is 0; something is wrong
> 2014-08-26 22:36:12.861428 7f5b38e507a0 -1 ^[[0;31m ** ERROR: error 
> converting store /ceph/osd.1: (22) Invalid argument^[[0m
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to