I also just got my new SSDs that are 480GB if they could be used to move the PGs to. Thank you for your help.
On Fri, Jan 26, 2018 at 8:33 AM David Turner <[email protected]> wrote: > If I could get it started, I could flush-evict the cache, but that's not > seeming likely. > > On Fri, Jan 26, 2018 at 8:33 AM David Turner <[email protected]> > wrote: > >> I wouldn't be shocked if they were out of space, but `ceph osd df` only >> showed them as 45% full when I was first diagnosing this. Now they are >> showing completely full with the same command. I'm thinking the cache tier >> behavior might have changed to Luminous because I was keeping my cache >> completely empty before with a max target objects of 0 which flushed things >> out consistently after my minimum flush age. I noticed it wasn't keeping >> up with the flushing as well as it had in Jewel, but didn't think too much >> of it. Anyway, that's something I can tinker with after the pools are back >> up and running. >> >> If they are full and on Bluestore, what can I do to clean them up? I >> assume that I need to keep the metadata pool in-tact, but I don't need to >> maintain any data in the cache pool. I have a copy of everything written >> in the last 24 hours prior to this incident and nothing is modified after >> it is in cephfs. >> >> On Fri, Jan 26, 2018 at 8:23 AM Nick Fisk <[email protected]> wrote: >> >>> I can see this in the logs: >>> >>> >>> >>> 2018-01-25 06:05:56.292124 7f37fa6ea700 -1 log_channel(cluster) log >>> [ERR] : full status failsafe engaged, dropping updates, now 101% full >>> >>> 2018-01-25 06:05:56.325404 7f3803f9c700 -1 >>> bluestore(/var/lib/ceph/osd/ceph-9) _do_alloc_write failed to reserve 0x4000 >>> >>> 2018-01-25 06:05:56.325434 7f3803f9c700 -1 >>> bluestore(/var/lib/ceph/osd/ceph-9) _do_write _do_alloc_write failed with >>> (28) No space left on device >>> >>> 2018-01-25 06:05:56.325462 7f3803f9c700 -1 >>> bluestore(/var/lib/ceph/osd/ceph-9) _txc_add_transaction error (28) No >>> space left on device not handled on operation 10 (op 0, counting from 0) >>> >>> >>> >>> Are they out of space, or is something mis-reporting? >>> >>> >>> >>> Nick >>> >>> >>> >>> *From:* ceph-users [mailto:[email protected]] *On >>> Behalf Of *David Turner >>> *Sent:* 26 January 2018 13:03 >>> *To:* ceph-users <[email protected]> >>> *Subject:* [ceph-users] BlueStore.cc: 9363: FAILED assert(0 == >>> "unexpected error") >>> >>> >>> >>> http://tracker.ceph.com/issues/22796 >>> >>> >>> >>> I was curious if anyone here had any ideas or experience with this >>> problem. I created the tracker for this yesterday when I woke up to find >>> all 3 of my SSD OSDs not running and unable to start due to this segfault. >>> These OSDs are in my small home cluster and hold the cephfs_cache and >>> cephfs_metadata pools. >>> >>> >>> >>> To recap, I upgraded from 10.2.10 to 12.2.2, successfully swapped out my >>> 9 OSDs to Bluestore, reconfigured my crush rules to utilize OSD classes, >>> failed to remove the CephFS cache tier due to >>> http://tracker.ceph.com/issues/22754, created these 3 SSD OSDs and >>> updated the cephfs_cache and cephfs_metadata pools to use the >>> replicated_ssd crush rule... fast forward 2 days of this working great to >>> me waking up with all 3 of them crashed and unable to start. There is an >>> OSD log with debug bluestore = 5 attached to the tracker at the top of the >>> email. >>> >>> >>> >>> My CephFS is completely down while these 2 pools are inaccessible. The >>> OSDs themselves are in-tact if I need to move the data out manually to the >>> HDDs or something. Any help is appreciated. >>> >>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
