I'm sorry to hear about that.

I'd say don't use btrfs at all, it has proven unstable for us in production
even without cache. It's just not ready for production use.


On Mon, Jun 2, 2014 at 5:20 PM, Scott Laird <[email protected]> wrote:

> I found a fun failure mode this weekend.
>
> I have 6 SSDs in my 6-node Ceph cluster at home.  The SSDs are
> partitioned; about half of the SSD is used for journal space for other
> OSDs, and half holds an OSD for a cache tier.  I finally turned it on the
> cache late last week, and everything was great, until yesterday morning,
> when my whole cluster was down, hard.
>
> Apparently, I mis-set target_max_bytes, because 5 of the 6 SSD partitions
> were 100% full.  On the 5 full machines (running Ubuntu's 3.14.1 kernel),
> the cache filesystem was unreadable; any attempt to access it threw kernel
> errors.  Rebooting cleared up 2 of those, leaving me with 3 of 6 devices
> alive in the pool, and 3 devices with corrupt filesystems.
>
> Apparently btrfs really, *REALLY* doesn't like full filesystems, because
> filling them 100% full seems to have fatally corrupted them.  No power
> loss, etc. involved.
>
> Trying to mount the filesystems fail, giving btrfs messages like this:
>
> [81720.111053] BTRFS: device fsid 319cbd8a-71ac-4b42-9d5c-b02658e75cdc
> devid 1 transid 61429 /dev/sde9
> [81720.113074] BTRFS info (device sde9): disk space caching is enabled
> [81720.188759] BTRFS: detected SSD devices, enabling SSD mode
> [81720.195442] BTRFS error (device sde9): block group 36528193536 has
> wrong amount of free space
> [81720.195488] BTRFS error (device sde9): failed to load free space cache
> for block group 36528193536
> [81720.205248] btree_readpage_end_io_hook: 69 callbacks suppressed
> [81720.205252] BTRFS: bad tree block start 0 395247616
> [81720.205622] BTRFS: bad tree block start 0 395247616
> [81720.212772] BTRFS: bad tree block start 0 39714816
> [81720.213152] BTRFS: bad tree block start 0 39714816
> [81720.213551] BTRFS: bad tree block start 0 39714816
> [81720.213925] BTRFS: bad tree block start 0 39714816
> [81720.214324] BTRFS: bad tree block start 0 39714816
> [81720.214697] BTRFS: bad tree block start 0 39714816
> [81720.215070] BTRFS: bad tree block start 0 39714816
> [81720.215441] BTRFS: bad tree block start 0 39714816
> [81720.246457] BTRFS: error (device sde9) in open_ctree:2839: errno=-5 IO
> failure (Failed to recover log tree)
> [81720.277276] BTRFS: open_ctree failed
>
> btrfsck wasn't helpful on the one system that I tried it on.  Nor was
> mounting with -o ro,recovery.  I can mount the filesystems if I run
> btrfs-zero-log (after dding a FS image), but Ceph is unhappy:
>
>
> # ceph-osd -i 9 -d
> 2014-06-02 08:10:49.217019 7f9873cc4800  0 ceph version 0.80.1
> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 17600
> starting osd.9 at :/0 osd_data /var/lib/ceph/osd/ceph-9
> /var/lib/ceph/osd/ceph-9/journal
> 2014-06-02 08:10:49.219400 7f9873cc4800  0
> filestore(/var/lib/ceph/osd/ceph-9) mount detected btrfs
> 2014-06-02 08:10:49.232826 7f9873cc4800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2014-06-02 08:10:49.232838 7f9873cc4800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2014-06-02 08:10:49.247357 7f9873cc4800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2014-06-02 08:10:49.247677 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: CLONE_RANGE
> ioctl is supported
> 2014-06-02 08:10:49.261718 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: SNAP_CREATE
> is supported
> 2014-06-02 08:10:49.262442 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature:
> SNAP_DESTROY is supported
> 2014-06-02 08:10:49.263020 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: START_SYNC
> is supported (transid 71371)
> 2014-06-02 08:10:49.269221 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: WAIT_SYNC
> is supported
> 2014-06-02 08:10:49.270902 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature:
> SNAP_CREATE_V2 is supported
> 2014-06-02 08:10:49.275792 7f9873cc4800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-9) list_checkpoints: stat
> '/var/lib/ceph/osd/ceph-9/snap_3415219' failed: (12) Cannot allocate memory
> 2014-06-02 08:10:49.275900 7f9873cc4800 -1
> filestore(/var/lib/ceph/osd/ceph-9) FileStore::mount : error in
> _list_snaps: (12) Cannot allocate memory
> 2014-06-02 08:10:49.275936 7f9873cc4800 -1  ** ERROR: error converting
> store /var/lib/ceph/osd/ceph-9: (12) Cannot allocate memory
>
>
> Similarly, I can recover most of the data via 'btrfs restore', but Ceph
> has a different failure mode:
>
> # ceph-osd -i 16 -d
> 2014-06-02 08:12:41.590122 7fdfda65e800  0 ceph version 0.80.1
> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 5094
> starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16
> /var/lib/ceph/osd/ceph-16/journal
> 2014-06-02 08:12:41.621624 7fdfda65e800  0
> filestore(/var/lib/ceph/osd/ceph-16) mount detected btrfs
> 2014-06-02 08:12:41.693025 7fdfda65e800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2014-06-02 08:12:41.693035 7fdfda65e800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2014-06-02 08:12:41.794817 7fdfda65e800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2014-06-02 08:12:41.795263 7fdfda65e800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> CLONE_RANGE ioctl is supported
> 2014-06-02 08:12:42.019636 7fdfda65e800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> SNAP_CREATE is supported
> 2014-06-02 08:12:42.020809 7fdfda65e800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> SNAP_DESTROY is supported
> 2014-06-02 08:12:42.020961 7fdfda65e800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: START_SYNC
> is supported (transid 68342)
> 2014-06-02 08:12:42.136140 7fdfda65e800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: WAIT_SYNC
> is supported
> 2014-06-02 08:12:42.146701 7fdfda65e800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> SNAP_CREATE_V2 is supported
> 2014-06-02 08:12:42.147929 7fdfda65e800  0
> filestore(/var/lib/ceph/osd/ceph-16) mount WARNING: no consistent snaps
> found, store may be in inconsistent state
> 2014-06-02 08:12:42.453012 7fdfda65e800  0
> filestore(/var/lib/ceph/osd/ceph-16) mount: enabling PARALLEL journal mode:
> fs, checkpoint is enabled
> 2014-06-02 08:12:42.484983 7fdfda65e800 -1 journal FileJournal::_open:
> disabling aio for non-block journal.  Use journal_force_aio to force use of
> aio anyway
> 2014-06-02 08:12:42.485018 7fdfda65e800  1 journal _open
> /var/lib/ceph/osd/ceph-16/journal fd 19: 5368709120 bytes, block size 4096
> bytes, directio = 1, aio = 0
> 2014-06-02 08:12:42.506080 7fdfda65e800 -1 journal FileJournal::open:
> ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected
> 259bb594-f316-44ab-a721-8e742d8c1c18, invalid (someone else's?) journal
> 2014-06-02 08:12:42.506122 7fdfda65e800 -1
> filestore(/var/lib/ceph/osd/ceph-16) mount failed to open journal
> /var/lib/ceph/osd/ceph-16/journal: (22) Invalid argument
> 2014-06-02 08:12:42.506271 7fdfda65e800 -1  ** ERROR: error converting
> store /var/lib/ceph/osd/ceph-16: (22) Invalid argument
>
>
> Running --mkjournal (it's just a copy, I don't mind blowing things away)
> doesn't help much:
>
> # ceph-osd -i 16 -d
> 2014-06-02 08:12:52.848067 7f8d748b9800  0 ceph version 0.80.1
> (a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 5106
> starting osd.16 at :/0 osd_data /var/lib/ceph/osd/ceph-16
> /var/lib/ceph/osd/ceph-16/journal
> 2014-06-02 08:12:52.850669 7f8d748b9800  0
> filestore(/var/lib/ceph/osd/ceph-16) mount detected btrfs
> 2014-06-02 08:12:52.881762 7f8d748b9800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2014-06-02 08:12:52.881772 7f8d748b9800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2014-06-02 08:12:53.025174 7f8d748b9800  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2014-06-02 08:12:53.025644 7f8d748b9800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> CLONE_RANGE ioctl is supported
> 2014-06-02 08:12:53.233272 7f8d748b9800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> SNAP_CREATE is supported
> 2014-06-02 08:12:53.233952 7f8d748b9800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> SNAP_DESTROY is supported
> 2014-06-02 08:12:53.234088 7f8d748b9800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: START_SYNC
> is supported (transid 68347)
> 2014-06-02 08:12:53.341491 7f8d748b9800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature: WAIT_SYNC
> is supported
> 2014-06-02 08:12:53.352080 7f8d748b9800  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-16) detect_feature:
> SNAP_CREATE_V2 is supported
> 2014-06-02 08:12:53.353056 7f8d748b9800  0
> filestore(/var/lib/ceph/osd/ceph-16) mount WARNING: no consistent snaps
> found, store may be in inconsistent state
> 2014-06-02 08:12:53.499770 7f8d748b9800  0
> filestore(/var/lib/ceph/osd/ceph-16) mount: enabling PARALLEL journal mode:
> fs, checkpoint is enabled
> 2014-06-02 08:12:53.502813 7f8d748b9800 -1 journal FileJournal::_open:
> disabling aio for non-block journal.  Use journal_force_aio to force use of
> aio anyway
> 2014-06-02 08:12:53.502844 7f8d748b9800  1 journal _open
> /var/lib/ceph/osd/ceph-16/journal fd 19: 5368709120 bytes, block size 4096
> bytes, directio = 1, aio = 0
> 2014-06-02 08:12:53.503313 7f8d748b9800  1 journal _open
> /var/lib/ceph/osd/ceph-16/journal fd 19: 5368709120 bytes, block size 4096
> bytes, directio = 1, aio = 0
> 2014-06-02 08:12:53.503754 7f8d748b9800  1 journal close
> /var/lib/ceph/osd/ceph-16/journal
> Aborted (core dumped)
>
> Any suggestions?  I'd like to recover the ~900 objects with writeback data
> sitting left on the SSDs.
>
>
> Anyway, the moral of the store: don't use btrfs for your cache devices.
>
> Lost filesystem count, after about 4 weeks and ~30 OSDs:
>
>   xfs: 1  (power loss -> directory structure trashed)
>   btrfs: 3
>
> I'm starting to miss ext3.
>
>
> Scott
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to