I can cope with single-FS failures, within reason. It's the coordinated failures across multiple servers that really freak me out.
On Mon, Jun 2, 2014 at 8:47 AM, Thorwald Lundqvist <[email protected]> wrote: > I'm sorry to hear about that. > > I'd say don't use btrfs at all, it has proven unstable for us in > production even without cache. It's just not ready for production use. > > > On Mon, Jun 2, 2014 at 5:20 PM, Scott Laird <[email protected]> wrote: > >> I found a fun failure mode this weekend. >> >> I have 6 SSDs in my 6-node Ceph cluster at home. The SSDs are >> partitioned; about half of the SSD is used for journal space for other >> OSDs, and half holds an OSD for a cache tier. I finally turned it on the >> cache late last week, and everything was great, until yesterday morning, >> when my whole cluster was down, hard. >> >> Apparently, I mis-set target_max_bytes, because 5 of the 6 SSD partitions >> were 100% full. On the 5 full machines (running Ubuntu's 3.14.1 kernel), >> the cache filesystem was unreadable; any attempt to access it threw kernel >> errors. Rebooting cleared up 2 of those, leaving me with 3 of 6 devices >> alive in the pool, and 3 devices with corrupt filesystems. >> >> Apparently btrfs really, *REALLY* doesn't like full filesystems, because >> filling them 100% full seems to have fatally corrupted them. No power >> loss, etc. involved. >> >> Trying to mount the filesystems fail, giving btrfs messages like this: >> >> [81720.111053] BTRFS: device fsid 319cbd8a-71ac-4b42-9d5c-b02658e75cdc >> devid 1 transid 61429 /dev/sde9 >> [81720.113074] BTRFS info (device sde9): disk space caching is enabled >> [81720.188759] BTRFS: detected SSD devices, enabling SSD mode >> [81720.195442] BTRFS error (device sde9): block group 36528193536 has >> wrong amount of free space >> [81720.195488] BTRFS error (device sde9): failed to load free space cache >> for block group 36528193536 >> [81720.205248] btree_readpage_end_io_hook: 69 callbacks suppressed >> [81720.205252] BTRFS: bad tree block start 0 395247616 >> [81720.205622] BTRFS: bad tree block start 0 395247616 >> [81720.212772] BTRFS: bad tree block start 0 39714816 >> [81720.213152] BTRFS: bad tree block start 0 39714816 >> [81720.213551] BTRFS: bad tree block start 0 39714816 >> [81720.213925] BTRFS: bad tree block start 0 39714816 >> [81720.214324] BTRFS: bad tree block start 0 39714816 >> [81720.214697] BTRFS: bad tree block start 0 39714816 >> [81720.215070] BTRFS: bad tree block start 0 39714816 >> [81720.215441] BTRFS: bad tree block start 0 39714816 >> [81720.246457] BTRFS: error (device sde9) in open_ctree:2839: errno=-5 IO >> failure (Failed to recover log tree) >> [81720.277276] BTRFS: open_ctree failed >> >> btrfsck wasn't helpful on the one system that I tried it on. Nor was >> mounting with -o ro,recovery. I can mount the filesystems if I run >> btrfs-zero-log (after dding a FS image), but Ceph is unhappy: >> >> >> # ceph-osd -i 9 -d >> 2014-06-02 08:10:49.217019 7f9873cc4800 0 ceph version 0.80.1 >> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 17600 >> starting osd.9 at :/0 osd_data /var/lib/ceph/osd/ceph-9 >> /var/lib/ceph/osd/ceph-9/journal >> 2014-06-02 08:10:49.219400 7f9873cc4800 0 >> filestore(/var/lib/ceph/osd/ceph-9) mount detected btrfs >> 2014-06-02 08:10:49.232826 7f9873cc4800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP >> ioctl is supported and appears to work >> 2014-06-02 08:10:49.232838 7f9873cc4800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2014-06-02 08:10:49.247357 7f9873cc4800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2014-06-02 08:10:49.247677 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: CLONE_RANGE >> ioctl is supported >> 2014-06-02 08:10:49.261718 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: SNAP_CREATE >> is supported >> 2014-06-02 08:10:49.262442 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: >> SNAP_DESTROY is supported >> 2014-06-02 08:10:49.263020 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: START_SYNC >> is supported (transid 71371) >> 2014-06-02 08:10:49.269221 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: WAIT_SYNC >> is supported >> 2014-06-02 08:10:49.270902 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: >> SNAP_CREATE_V2 is supported >> 2014-06-02 08:10:49.275792 7f9873cc4800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) list_checkpoints: stat >> '/var/lib/ceph/osd/ceph-9/snap_3415219' failed: (12) Cannot allocate memory >> 2014-06-02 08:10:49.275900 7f9873cc4800 -1 >> filestore(/var/lib/ceph/osd/ceph-9) FileStore::mount : error in >> _list_snaps: (12) Cannot allocate memory >> 2014-06-02 08:10:49.275936 7f9873cc4800 -1 ** ERROR: error converting >> store /var/lib/ceph/osd/ceph-9: (12) Cannot allocate memory >> >> >> Similarly, I can recover most of the data via 'btrfs restore', but Ceph >> has a different failure mode: >> >> # ceph-osd -i 16 -d >> 2014-06-02 08:12:41.590122 7fdfda65e800 0 ceph version 0.80.1 >> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 5094 >> starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 >> /var/lib/ceph/osd/ceph-16/journal >> 2014-06-02 08:12:41.621624 7fdfda65e800 0 >> filestore(/var/lib/ceph/osd/ceph-16) mount detected btrfs >> 2014-06-02 08:12:41.693025 7fdfda65e800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP >> ioctl is supported and appears to work >> 2014-06-02 08:12:41.693035 7fdfda65e800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2014-06-02 08:12:41.794817 7fdfda65e800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2014-06-02 08:12:41.795263 7fdfda65e800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> CLONE_RANGE ioctl is supported >> 2014-06-02 08:12:42.019636 7fdfda65e800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> SNAP_CREATE is supported >> 2014-06-02 08:12:42.020809 7fdfda65e800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> SNAP_DESTROY is supported >> 2014-06-02 08:12:42.020961 7fdfda65e800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: START_SYNC >> is supported (transid 68342) >> 2014-06-02 08:12:42.136140 7fdfda65e800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: WAIT_SYNC >> is supported >> 2014-06-02 08:12:42.146701 7fdfda65e800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> SNAP_CREATE_V2 is supported >> 2014-06-02 08:12:42.147929 7fdfda65e800 0 >> filestore(/var/lib/ceph/osd/ceph-16) mount WARNING: no consistent snaps >> found, store may be in inconsistent state >> 2014-06-02 08:12:42.453012 7fdfda65e800 0 >> filestore(/var/lib/ceph/osd/ceph-16) mount: enabling PARALLEL journal mode: >> fs, checkpoint is enabled >> 2014-06-02 08:12:42.484983 7fdfda65e800 -1 journal FileJournal::_open: >> disabling aio for non-block journal. Use journal_force_aio to force use of >> aio anyway >> 2014-06-02 08:12:42.485018 7fdfda65e800 1 journal _open >> /var/lib/ceph/osd/ceph-16/journal fd 19: 5368709120 bytes, block size 4096 >> bytes, directio = 1, aio = 0 >> 2014-06-02 08:12:42.506080 7fdfda65e800 -1 journal FileJournal::open: >> ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected >> 259bb594-f316-44ab-a721-8e742d8c1c18, invalid (someone else's?) journal >> 2014-06-02 08:12:42.506122 7fdfda65e800 -1 >> filestore(/var/lib/ceph/osd/ceph-16) mount failed to open journal >> /var/lib/ceph/osd/ceph-16/journal: (22) Invalid argument >> 2014-06-02 08:12:42.506271 7fdfda65e800 -1 ** ERROR: error converting >> store /var/lib/ceph/osd/ceph-16: (22) Invalid argument >> >> >> Running --mkjournal (it's just a copy, I don't mind blowing things away) >> doesn't help much: >> >> # ceph-osd -i 16 -d >> 2014-06-02 08:12:52.848067 7f8d748b9800 0 ceph version 0.80.1 >> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 5106 >> starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16 >> /var/lib/ceph/osd/ceph-16/journal >> 2014-06-02 08:12:52.850669 7f8d748b9800 0 >> filestore(/var/lib/ceph/osd/ceph-16) mount detected btrfs >> 2014-06-02 08:12:52.881762 7f8d748b9800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP >> ioctl is supported and appears to work >> 2014-06-02 08:12:52.881772 7f8d748b9800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP >> ioctl is disabled via 'filestore fiemap' config option >> 2014-06-02 08:12:53.025174 7f8d748b9800 0 >> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: >> syncfs(2) syscall fully supported (by glibc and kernel) >> 2014-06-02 08:12:53.025644 7f8d748b9800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> CLONE_RANGE ioctl is supported >> 2014-06-02 08:12:53.233272 7f8d748b9800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> SNAP_CREATE is supported >> 2014-06-02 08:12:53.233952 7f8d748b9800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> SNAP_DESTROY is supported >> 2014-06-02 08:12:53.234088 7f8d748b9800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: START_SYNC >> is supported (transid 68347) >> 2014-06-02 08:12:53.341491 7f8d748b9800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: WAIT_SYNC >> is supported >> 2014-06-02 08:12:53.352080 7f8d748b9800 0 >> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: >> SNAP_CREATE_V2 is supported >> 2014-06-02 08:12:53.353056 7f8d748b9800 0 >> filestore(/var/lib/ceph/osd/ceph-16) mount WARNING: no consistent snaps >> found, store may be in inconsistent state >> 2014-06-02 08:12:53.499770 7f8d748b9800 0 >> filestore(/var/lib/ceph/osd/ceph-16) mount: enabling PARALLEL journal mode: >> fs, checkpoint is enabled >> 2014-06-02 08:12:53.502813 7f8d748b9800 -1 journal FileJournal::_open: >> disabling aio for non-block journal. Use journal_force_aio to force use of >> aio anyway >> 2014-06-02 08:12:53.502844 7f8d748b9800 1 journal _open >> /var/lib/ceph/osd/ceph-16/journal fd 19: 5368709120 bytes, block size 4096 >> bytes, directio = 1, aio = 0 >> 2014-06-02 08:12:53.503313 7f8d748b9800 1 journal _open >> /var/lib/ceph/osd/ceph-16/journal fd 19: 5368709120 bytes, block size 4096 >> bytes, directio = 1, aio = 0 >> 2014-06-02 08:12:53.503754 7f8d748b9800 1 journal close >> /var/lib/ceph/osd/ceph-16/journal >> Aborted (core dumped) >> >> Any suggestions? I'd like to recover the ~900 objects with writeback >> data sitting left on the SSDs. >> >> >> Anyway, the moral of the store: don't use btrfs for your cache devices. >> >> Lost filesystem count, after about 4 weeks and ~30 OSDs: >> >> xfs: 1 (power loss -> directory structure trashed) >> btrfs: 3 >> >> I'm starting to miss ext3. >> >> >> Scott >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
