Hi Paul! I installed ceph-debuginfo and set these: debug bluestore = 20/20 debug osd = 20/20 debug bluefs = 20/20 debug bdev = 20/20
V: ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable) *LOGS* *OSD 29:* 2018-10-08 10:29:06.001 7f810511a1c0 20 bluefs _read left 0x4d000 len 0x1000 2018-10-08 10:29:06.001 7f810511a1c0 20 bluefs _read got 4096 2018-10-08 10:29:06.001 7f810511a1c0 10 bluefs _replay 0x12b3000: stop: uuid e510614a-7ca4-eb59-0383-010189889f01 != super.uuid 4df25e30-4769-47b5-b569-01b3f83de70c 2018-10-08 10:29:06.001 7f810511a1c0 10 bluefs _replay log file size was 0x12b3000 2018-10-08 10:29:06.001 7f810511a1c0 -1 bluefs _replay file with link count 0: file(ino 519 size 0x31e2f42 mtime 2018-10-02 12:24:22.632397 bdev 1 allocated 3200000 extents [1:0x7008200000+100000,1:0x7009000000+100000,1:0x7009100000+100000,1:0x7009200000+100000,1:0x7009300000+100000,1:0x7009400000+100000,1:0x7009500000+100000,1:0x7009600000+100000,1:0x7009700000+100000,1:0x7009800000+100000,1:0x7009900000+100000,1:0x7009a00000+100000,1:0x7009b00000+100000,1:0x7009c00000+100000,1:0x7009d00000+100000,1:0x7009e00000+100000,1:0x7009f00000+100000,1:0x700a000000+100000,1:0x700a100000+100000,1:0x700a200000+100000,1:0x700a300000+100000,1:0x700a400000+100000,1:0x700a500000+100000,1:0x700a600000+100000,1:0x700a700000+100000,1:0x700a800000+100000,1:0x700a900000+100000,1:0x700aa00000+100000,1:0x700ab00000+100000,1:0x700ac00000+100000,1:0x700ad00000+100000,1:0x700ae00000+100000,1:0x700af00000+100000,1:0x700b000000+100000,1:0x700b100000+100000,1:0x700b200000+100000,1:0x700b300000+100000,1:0x700b400000+100000,1:0x700b500000+100000,1:0x700b600000+100000,1:0x700b700000+100000,1:0x700b800000+100000,1:0x700b900000+100000,1:0x700ba00000+100000,1:0x700bb00000+100000,1:0x700bc00000+100000,1:0x700bd00000+100000,1:0x700be00000+100000,1:0x700bf00000+100000,1:0x700c000000+100000]) 2018-10-08 10:29:06.001 7f810511a1c0 -1 bluefs mount failed to replay log: (5) Input/output error 2018-10-08 10:29:06.001 7f810511a1c0 20 bluefs _stop_alloc 2018-10-08 10:29:06.001 7f810511a1c0 10 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) discard_drain 2018-10-08 10:29:06.001 7f810511a1c0 1 stupidalloc 0x0x558b8f34d0a0 shutdown 2018-10-08 10:29:06.001 7f810511a1c0 -1 bluestore(/var/lib/ceph/osd/ceph-29) _open_db failed bluefs mount: (5) Input/output error 2018-10-08 10:29:06.001 7f810511a1c0 20 bdev aio_wait 0x558b8f34f440 done 2018-10-08 10:29:06.001 7f810511a1c0 1 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) close 2018-10-08 10:29:06.001 7f810511a1c0 10 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) _aio_stop 2018-10-08 10:29:06.066 7f80ed75f700 10 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) _aio_thread end 2018-10-08 10:29:06.073 7f810511a1c0 10 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) _discard_stop 2018-10-08 10:29:06.073 7f80ecf5e700 20 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) _discard_thread wake 2018-10-08 10:29:06.073 7f80ecf5e700 10 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) _discard_thread finish 2018-10-08 10:29:06.073 7f810511a1c0 10 bdev(0x558b8f1dea80 /var/lib/ceph/osd/ceph-29/block) _discard_stop stopped 2018-10-08 10:29:06.073 7f810511a1c0 1 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) close 2018-10-08 10:29:06.073 7f810511a1c0 10 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) _aio_stop 2018-10-08 10:29:06.315 7f80ee761700 10 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) _aio_thread end 2018-10-08 10:29:06.321 7f810511a1c0 10 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) _discard_stop 2018-10-08 10:29:06.321 7f80edf60700 20 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) _discard_thread wake 2018-10-08 10:29:06.321 7f80edf60700 10 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) _discard_thread finish 2018-10-08 10:29:06.321 7f810511a1c0 10 bdev(0x558b8f1de000 /var/lib/ceph/osd/ceph-29/block) _discard_stop stopped 2018-10-08 10:29:06.322 7f810511a1c0 -1 osd.29 0 OSD:init: unable to mount object store 2018-10-08 10:29:06.322 7f810511a1c0 -1 ** ERROR: osd init failed: (5) Input/output error *OSD 40 (keeps getting restarted by systemd):* 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11872 (4754'11871) modify 5:fd843365:::10000000229.000029b3:head by client.1109026.0:2115960 2018-09-23 02:48:36.736842 0 2018-10-08 10:33:01.867 7fbdd21441c0 10 bluefs _read_random h 0x5566a75fb480 0x4a2a19~1036 from file(ino 539 size 0x3fa66ff mtime 2018-10-02 12:19:02.174614 bdev 1 allocated 4000000 extents [1:0x7004e00000+4000000]) 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluefs _read_random read buffered 0x4a2a19~1036 of 1:0x7004e00000+4000000 2018-10-08 10:33:01.867 7fbdd21441c0 5 bdev(0x5566a70dea80 /var/lib/ceph/osd/ceph-40/block) read_random 0x70052a2a19~1036 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluefs _read_random got 4150 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011873' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11873 (0'0) modify 5:fde05091:::10000000229.00002a55:head by client.1109026.0:2116301 2018-09-23 02:49:12.650261 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011874' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11874 (4754'11873) modify 5:fde05091:::10000000229.00002a55:head by client.1109026.0:2116303 2018-09-23 02:49:12.862775 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011875' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11875 (0'0) modify 5:fddd113d:::10000000229.00002b8a:head by client.1109026.0:2116962 2018-09-23 02:50:31.402401 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011876' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11876 (4754'11875) modify 5:fddd113d:::10000000229.00002b8a:head by client.1109026.0:2116964 2018-09-23 02:50:31.593404 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011877' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11877 (0'0) modify 5:fdfed313:::10000000229.00002c06:head by client.1109026.0:2117222 2018-09-23 02:51:04.498945 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011878' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11878 (4754'11877) modify 5:fdfed313:::10000000229.00002c06:head by client.1109026.0:2117224 2018-09-23 02:51:04.699086 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011879' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11879 (0'0) modify 5:fda592f5:::10000000229.00002d5d:head by client.1109026.0:2117943 2018-09-23 02:52:25.074663 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011880' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11880 (4754'11879) modify 5:fda592f5:::10000000229.00002d5d:head by client.1109026.0:2117945 2018-09-23 02:52:25.378019 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011881' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11881 (0'0) modify 5:fdec7025:::10000000229.00002e24:head by client.1109026.0:2118374 2018-09-23 02:53:24.086509 0 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011882' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11882 (4754'11881) modify 5:fdec7025:::10000000229.00002e24:head by client.1109026.0:2118376 2018-09-23 02:53:24.294116 0 2018-10-08 10:33:01.867 7fbdd21441c0 10 bluefs _read_random h 0x5566a75fb480 0x4a3a4f~f2c from file(ino 539 size 0x3fa66ff mtime 2018-10-02 12:19:02.174614 bdev 1 allocated 4000000 extents [1:0x7004e00000+4000000]) 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluefs _read_random read buffered 0x4a3a4f~f2c of 1:0x7004e00000+4000000 2018-10-08 10:33:01.867 7fbdd21441c0 5 bdev(0x5566a70dea80 /var/lib/ceph/osd/ceph-40/block) read_random 0x70052a3a4f~f2c 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluefs _read_random got 3884 2018-10-08 10:33:01.867 7fbdd21441c0 20 bluestore.OmapIteratorImpl(0x5566bddf5f80) valid is at 0x0000000000000769'.0000004754.00000000000000011883' 2018-10-08 10:33:01.867 7fbdd21441c0 20 read_log_and_missing 4754'11883 (0'0) modify 5:fdf4aaf7:::10000000229.0000319a:head by client.1109026.0:2120361 2018-09-23 03:00:28.877386 0 [...] 2018-10-08 10:36:48.349 7f4948e99700 20 trim shard target 0 B ratio 0 (0 B + 0 B), current 12 KiB (12 KiB + 0 B), need_to_free 12 KiB (12 KiB + 0 B) -> max 0 onodes + 0 buffer 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim onodes 12 / 0 buffers 0 / 0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x55903bb1bd40 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 3#5:b0000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x559043ac86c0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 5#5:a1000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x55903b879680 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 1#5:01000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x559044390240 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 3#5:21800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x5590447eb200 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 2#5:9b800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f49628581c0 20 read_log_and_missing 4738'10907 (0'0) modify 5:f895d9a8:::100000001e3.00008f27:head by client.1109026.0:42051 2018-09-21 20:00:14.298602 0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x559045067d40 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 1#5:86800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x559047d43b00 2018-10-08 10:36:48.349 7f49628581c0 20 bluestore.OmapIteratorImpl(0x559046345900) valid is at 0x00000000000591c6'.0000004738.00000000000000010908' 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 5#5:51000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x559048892d80 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 0#5:c9800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x5590483f38c0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 7#5:88800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x559038c68b40 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 2#5:46000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x55904c1706c0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 0#5:e0800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim considering 0x55904c9dd680 2018-10-08 10:36:48.349 7f49628581c0 20 read_log_and_missing 4738'10908 (4738'10907) modify 5:f895d9a8:::100000001e3.00008f27:head by client.1109026.0:42070 2018-09-21 20:00:17.441237 0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x559036804fc0) _trim 4#5:d3000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f49628581c0 20 bluestore.OmapIteratorImpl(0x559046345900) valid is at 0x00000000000591c6'.0000004738.00000000000000010909' 2018-10-08 10:36:48.349 7f4948e99700 20 trim shard target 0 B ratio 0 (0 B + 0 B), current 10 KiB (10 KiB + 0 B), need_to_free 10 KiB (10 KiB + 0 B) -> max 0 onodes + 0 buffer 2018-10-08 10:36:48.349 7f49628581c0 20 read_log_and_missing 4738'10909 (4728'10886) modify 5:f8b6efe8:::10000000215.000036de:head by mds.1.1257:3493034 0.000000 0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim onodes 10 / 0 buffers 0 / 0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55903dcb4b40 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 6#5:59000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55903e414000 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 7#5:a9000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x559046187200 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 1#5:ba800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x559047928000 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 2#5:46800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55904a6f8fc0 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 3#5:a2000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55904ab36b40 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 1#5:e6800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55904afce900 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 4#5:79800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55904b881200 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 2#5:61000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55904c59ed80 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 3#5:34000000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim considering 0x55904e0e7440 2018-10-08 10:36:48.349 7f4948e99700 20 bluestore.2QCache(0x5590368050a0) _trim 2#5:6e800000::::head# has 2 refs; skipping 2018-10-08 10:36:48.349 7f49628581c0 10 bluefs _read_random h 0x559036e27280 0x1635da3~1035 from file(ino 539 size 0x3fa66ff mtime 2018-10-02 12:19:02.174614 bdev 1 allocated 4000000 extents [1:0x7004e00000+4000000]) 2018-10-08 10:36:48.349 7f49628581c0 20 bluefs _read_random read buffered 0x1635da3~1035 of 1:0x7004e00000+4000000 2018-10-08 10:36:48.349 7f49628581c0 5 bdev(0x559036b76a80 /var/lib/ceph/osd/ceph-40/block) read_random 0x7006435da3~1035 *OSD 42:* no log yet, maybe disk died, investigating *OSD 46:* 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _read left 0x25000 len 0x1000 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _read got 4096 2018-10-08 10:32:17.434 7f6af518e1c0 10 bluefs _replay 0x8db000: txn(seq 496420 len 0x145 crc 0xb79f17b8) 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _replay 0x8db000: op_file_update file(ino 450 size 0x169924e mtime 2018-10-02 12:24:22.550352 bdev 1 allocated 1700000 extents [1:0x6fd9500000+100000,1:0x6fd9600000+100000,1:0x6fd9700000+100000,1:0x6fd9800000+100000,1:0x6fd9900000+100000,1:0x6fd9a00000+100000,1:0x6fd9b00000+100000,1:0x6fd9c00000+100000,1:0x6fd9d00000+100000,1:0x6fd9e00000+100000,1:0x6fd9f00000+100000,1:0x6fda000000+100000,1:0x6fda100000+100000,1:0x6fda200000+100000,1:0x6fda300000+100000,1:0x6fda400000+100000,1:0x6fda500000+100000,1:0x6fda600000+100000,1:0x6fda700000+100000,1:0x6fda800000+100000,1:0x6fda900000+100000,1:0x6fdaa00000+100000,1:0x6fdab00000+100000]) 2018-10-08 10:32:17.434 7f6af518e1c0 10 bluefs _read h 0x55a3a1e21180 0x8dc000~1000 from file(ino 1 size 0x8dc000 mtime 0.000000 bdev 0 allocated d00000 extents [1:0x70c3b00000+100000,1:0x70c3700000+400000,1:0x70c4900000+400000,1:0x70c4d00000+400000]) 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _read left 0x24000 len 0x1000 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _read got 4096 2018-10-08 10:32:17.434 7f6af518e1c0 10 bluefs _replay 0x8dc000: txn(seq 496421 len 0x145 crc 0x48e83346) 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _replay 0x8dc000: op_file_update file(ino 450 size 0x169964c mtime 2018-10-02 12:24:22.602432 bdev 1 allocated 1700000 extents [1:0x6fd9500000+100000,1:0x6fd9600000+100000,1:0x6fd9700000+100000,1:0x6fd9800000+100000,1:0x6fd9900000+100000,1:0x6fd9a00000+100000,1:0x6fd9b00000+100000,1:0x6fd9c00000+100000,1:0x6fd9d00000+100000,1:0x6fd9e00000+100000,1:0x6fd9f00000+100000,1:0x6fda000000+100000,1:0x6fda100000+100000,1:0x6fda200000+100000,1:0x6fda300000+100000,1:0x6fda400000+100000,1:0x6fda500000+100000,1:0x6fda600000+100000,1:0x6fda700000+100000,1:0x6fda800000+100000,1:0x6fda900000+100000,1:0x6fdaa00000+100000,1:0x6fdab00000+100000]) 2018-10-08 10:32:17.434 7f6af518e1c0 10 bluefs _read h 0x55a3a1e21180 0x8dd000~1000 from file(ino 1 size 0x8dd000 mtime 0.000000 bdev 0 allocated d00000 extents [1:0x70c3b00000+100000,1:0x70c3700000+400000,1:0x70c4900000+400000,1:0x70c4d00000+400000]) 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _read left 0x23000 len 0x1000 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _read got 4096 2018-10-08 10:32:17.434 7f6af518e1c0 10 bluefs _replay 0x8dd000: stop: uuid 73657423-0000-0003-021d-000000010000 != super.uuid 7c5a6f99-6140-4fca-a2f3-158df761635a 2018-10-08 10:32:17.434 7f6af518e1c0 10 bluefs _replay log file size was 0x8dd000 2018-10-08 10:32:17.434 7f6af518e1c0 -1 bluefs _replay file with link count 0: file(ino 450 size 0x169964c mtime 2018-10-02 12:24:22.602432 bdev 1 allocated 1700000 extents [1:0x6fd9500000+100000,1:0x6fd9600000+100000,1:0x6fd9700000+100000,1:0x6fd9800000+100000,1:0x6fd9900000+100000,1:0x6fd9a00000+100000,1:0x6fd9b00000+100000,1:0x6fd9c00000+100000,1:0x6fd9d00000+100000,1:0x6fd9e00000+100000,1:0x6fd9f00000+100000,1:0x6fda000000+100000,1:0x6fda100000+100000,1:0x6fda200000+100000,1:0x6fda300000+100000,1:0x6fda400000+100000,1:0x6fda500000+100000,1:0x6fda600000+100000,1:0x6fda700000+100000,1:0x6fda800000+100000,1:0x6fda900000+100000,1:0x6fdaa00000+100000,1:0x6fdab00000+100000]) 2018-10-08 10:32:17.434 7f6af518e1c0 -1 bluefs mount failed to replay log: (5) Input/output error 2018-10-08 10:32:17.434 7f6af518e1c0 20 bluefs _stop_alloc 2018-10-08 10:32:17.434 7f6af518e1c0 10 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) discard_drain 2018-10-08 10:32:17.434 7f6af518e1c0 1 stupidalloc 0x0x55a3a1ecf180 shutdown 2018-10-08 10:32:17.434 7f6af518e1c0 -1 bluestore(/var/lib/ceph/osd/ceph-46) _open_db failed bluefs mount: (5) Input/output error 2018-10-08 10:32:17.434 7f6af518e1c0 20 bdev aio_wait 0x55a3a1edb8c0 done 2018-10-08 10:32:17.434 7f6af518e1c0 1 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) close 2018-10-08 10:32:17.434 7f6af518e1c0 10 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) _aio_stop 2018-10-08 10:32:17.568 7f6add7d3700 10 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) _aio_thread end 2018-10-08 10:32:17.573 7f6af518e1c0 10 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) _discard_stop 2018-10-08 10:32:17.573 7f6adcfd2700 20 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) _discard_thread wake 2018-10-08 10:32:17.573 7f6adcfd2700 10 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) _discard_thread finish 2018-10-08 10:32:17.573 7f6af518e1c0 10 bdev(0x55a3a1d62a80 /var/lib/ceph/osd/ceph-46/block) _discard_stop stopped 2018-10-08 10:32:17.573 7f6af518e1c0 1 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) close 2018-10-08 10:32:17.573 7f6af518e1c0 10 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) _aio_stop 2018-10-08 10:32:17.817 7f6ade7d5700 10 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) _aio_thread end 2018-10-08 10:32:17.822 7f6af518e1c0 10 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) _discard_stop 2018-10-08 10:32:17.822 7f6addfd4700 20 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) _discard_thread wake 2018-10-08 10:32:17.822 7f6addfd4700 10 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) _discard_thread finish 2018-10-08 10:32:17.822 7f6af518e1c0 10 bdev(0x55a3a1d62000 /var/lib/ceph/osd/ceph-46/block) _discard_stop stopped 2018-10-08 10:32:17.823 7f6af518e1c0 -1 osd.46 0 OSD:init: unable to mount object store 2018-10-08 10:32:17.823 7f6af518e1c0 -1 ** ERROR: osd init failed: (5) Input/output error Anything interesting here? I will try to export the down PGs from the disks. I got a bunch of new disks to replace all. Most of current disks are of same age. Kevin Am Mi., 3. Okt. 2018 um 13:52 Uhr schrieb Paul Emmerich < [email protected]>: > There's "ceph-bluestore-tool repair/fsck" > > In your scenario, a few more log files would be interesting: try > setting debug bluefs to 20/20. And if that's not enough log try also > setting debug osd, debug bluestore, and debug bdev to 20/20. > > > > Paul > Am Mi., 3. Okt. 2018 um 13:48 Uhr schrieb Kevin Olbrich <[email protected]>: > > > > The disks were deployed with ceph-deploy / ceph-volume using the default > style (lvm) and not simple-mode. > > > > The disks were provisioned as a whole, no resizing. I never touched the > disks after deployment. > > > > It is very strange that this first happened after the update, never met > such an error before. > > > > I found a BUG in the tracker, that also shows such an error with count > 0. That was closed with „can’t reproduce“ (don’t have the link ready). For > me this seems like the data itself is fine and I just hit a bad transaction > in the replay (which maybe caused the crash in the first place). > > > > I need one of three disks back. Object corruption would not be a problem > (regarding drop of a journal), as this cluster hosts backups which will > fail validation and regenerate. Just marking the OSD lost does not seem to > be an option. > > > > Is there some sort of fsck for BlueFS? > > > > Kevin > > > > > > Igor Fedotov <[email protected]> schrieb am Mi. 3. Okt. 2018 um 13:01: > >> > >> I've seen somewhat similar behavior in a log from Sergey Malinin in > another thread ("mimic: 3/4 OSDs crashed...") > >> > >> He claimed it happened after LVM volume expansion. Isn't this the case > for you? > >> > >> Am I right that you use LVM volumes? > >> > >> > >> On 10/3/2018 11:22 AM, Kevin Olbrich wrote: > >> > >> Small addition: the failing disks are in the same host. > >> This is a two-host, failure-domain OSD cluster. > >> > >> > >> Am Mi., 3. Okt. 2018 um 10:13 Uhr schrieb Kevin Olbrich <[email protected]>: > >>> > >>> Hi! > >>> > >>> Yesterday one of our (non-priority) clusters failed when 3 OSDs went > down (EC 8+2) together. > >>> This is strange as we did an upgrade from 13.2.1 to 13.2.2 one or two > hours before. > >>> They failed exactly at the same moment, rendering the cluster unusable > (CephFS). > >>> We are using CentOS 7 with latest updates and ceph repo. No cache > SSDs, no external journal / wal / db. > >>> > >>> OSD 29 (no disk failure in dmesg): > >>> 2018-10-03 09:47:15.074 7fb8835ce1c0 0 set uid:gid to 167:167 > (ceph:ceph) > >>> 2018-10-03 09:47:15.074 7fb8835ce1c0 0 ceph version 13.2.2 > (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process > ceph-osd, pid 20899 > >>> 2018-10-03 09:47:15.074 7fb8835ce1c0 0 pidfile_write: ignore empty > --pid-file > >>> 2018-10-03 09:47:15.100 7fb8835ce1c0 0 load: jerasure load: lrc load: > isa > >>> 2018-10-03 09:47:15.100 7fb8835ce1c0 1 bdev create path > /var/lib/ceph/osd/ceph-29/block type kernel > >>> 2018-10-03 09:47:15.100 7fb8835ce1c0 1 bdev(0x561250a20000 > /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block > >>> 2018-10-03 09:47:15.100 7fb8835ce1c0 1 bdev(0x561250a20000 > /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e0800000, 932 > GiB) block_size 4096 (4 KiB) rotational > >>> 2018-10-03 09:47:15.101 7fb8835ce1c0 1 > bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 > > kv_ratio 0.5 > >>> 2018-10-03 09:47:15.101 7fb8835ce1c0 1 > bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912 > meta 0 kv 1 data 0 > >>> 2018-10-03 09:47:15.101 7fb8835ce1c0 1 bdev(0x561250a20000 > /var/lib/ceph/osd/ceph-29/block) close > >>> 2018-10-03 09:47:15.358 7fb8835ce1c0 1 > bluestore(/var/lib/ceph/osd/ceph-29) _mount path /var/lib/ceph/osd/ceph-29 > >>> 2018-10-03 09:47:15.358 7fb8835ce1c0 1 bdev create path > /var/lib/ceph/osd/ceph-29/block type kernel > >>> 2018-10-03 09:47:15.358 7fb8835ce1c0 1 bdev(0x561250a20000 > /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block > >>> 2018-10-03 09:47:15.359 7fb8835ce1c0 1 bdev(0x561250a20000 > /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e0800000, 932 > GiB) block_size 4096 (4 KiB) rotational > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 > bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes kv_min_ratio 1 > > kv_ratio 0.5 > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 > bluestore(/var/lib/ceph/osd/ceph-29) _set_cache_sizes cache_size 536870912 > meta 0 kv 1 data 0 > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 bdev create path > /var/lib/ceph/osd/ceph-29/block type kernel > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 bdev(0x561250a20a80 > /var/lib/ceph/osd/ceph-29/block) open path /var/lib/ceph/osd/ceph-29/block > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 bdev(0x561250a20a80 > /var/lib/ceph/osd/ceph-29/block) open size 1000198897664 (0xe8e0800000, 932 > GiB) block_size 4096 (4 KiB) rotational > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 bluefs add_block_device bdev 1 > path /var/lib/ceph/osd/ceph-29/block size 932 GiB > >>> 2018-10-03 09:47:15.360 7fb8835ce1c0 1 bluefs mount > >>> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs _replay file with link > count 0: file(ino 519 size 0x31e2f42 mtime 2018-10-02 12:24:22.632397 bdev > 1 allocated 3200000 extents > [1:0x7008200000+100000,1:0x7009000000+100000,1:0x7009100000+100000,1:0x7009200000+100000,1:0x7009300000+100000,1:0x7009400000+100000,1:0x7009500000+100000,1:0x7009600000+100000,1:0x7009700000+100000,1:0x7009800000+100000,1:0x7009900000+100000,1:0x7009a00000+100000,1:0x7009b00000+100000,1:0x7009c00000+100000,1:0x7009d00000+100000,1:0x7009e00000+100000,1:0x7009f00000+100000,1:0x700a000000+100000,1:0x700a100000+100000,1:0x700a200000+100000,1:0x700a300000+100000,1:0x700a400000+100000,1:0x700a500000+100000,1:0x700a600000+100000,1:0x700a700000+100000,1:0x700a800000+100000,1:0x700a900000+100000,1:0x700aa00000+100000,1:0x700ab00000+100000,1:0x700ac00000+100000,1:0x700ad00000+100000,1:0x700ae00000+100000,1:0x700af00000+100000,1:0x700b000000+100000,1:0x700b100000+100000,1:0x700b200000+100000,1:0x700b300000+100000,1:0x700b400000+100000,1:0x700b500000+100000,1:0x700b600000+100000,1:0x700b700000+100000,1:0x700b800000+100000,1:0x700b900000+100000,1:0x700ba00000+100000,1:0x700bb00000+100000,1:0x700bc00000+100000,1:0x700bd00000+100000,1:0x700be00000+100000,1:0x700bf00000+100000,1:0x700c000000+100000]) > >>> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1 bluefs mount failed to replay > log: (5) Input/output error > >>> 2018-10-03 09:47:15.538 7fb8835ce1c0 1 stupidalloc 0x0x561250b8d030 > shutdown > >>> 2018-10-03 09:47:15.538 7fb8835ce1c0 -1 > bluestore(/var/lib/ceph/osd/ceph-29) _open_db failed bluefs mount: (5) > Input/output error > >>> 2018-10-03 09:47:15.538 7fb8835ce1c0 1 bdev(0x561250a20a80 > /var/lib/ceph/osd/ceph-29/block) close > >>> 2018-10-03 09:47:15.616 7fb8835ce1c0 1 bdev(0x561250a20000 > /var/lib/ceph/osd/ceph-29/block) close > >>> 2018-10-03 09:47:15.870 7fb8835ce1c0 -1 osd.29 0 OSD:init: unable to > mount object store > >>> 2018-10-03 09:47:15.870 7fb8835ce1c0 -1 ** ERROR: osd init failed: > (5) Input/output error > >>> > >>> OSD 42: > >>> disk is found by lvm, tmpfs is created but service immediately dies on > start without log... > >>> This might be failed. > >>> > >>> OSD 47 (same as above, seems not be died, no dmesg trace): > >>> 2018-10-03 10:02:25.221 7f4d54b611c0 0 set uid:gid to 167:167 > (ceph:ceph) > >>> 2018-10-03 10:02:25.221 7f4d54b611c0 0 ceph version 13.2.2 > (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process > ceph-osd, pid 8993 > >>> 2018-10-03 10:02:25.221 7f4d54b611c0 0 pidfile_write: ignore empty > --pid-file > >>> 2018-10-03 10:02:25.247 7f4d54b611c0 0 load: jerasure load: lrc load: > isa > >>> 2018-10-03 10:02:25.248 7f4d54b611c0 1 bdev create path > /var/lib/ceph/osd/ceph-46/block type kernel > >>> 2018-10-03 10:02:25.248 7f4d54b611c0 1 bdev(0x564072f96000 > /var/lib/ceph/osd/ceph-46/block) open path /var/lib/ceph/osd/ceph-46/block > >>> 2018-10-03 10:02:25.248 7f4d54b611c0 1 bdev(0x564072f96000 > /var/lib/ceph/osd/ceph-46/block) open size 1000198897664 (0xe8e0800000, 932 > GiB) block_size 4096 (4 KiB) rotational > >>> 2018-10-03 10:02:25.249 7f4d54b611c0 1 > bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes kv_min_ratio 1 > > kv_ratio 0.5 > >>> 2018-10-03 10:02:25.249 7f4d54b611c0 1 > bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes cache_size 536870912 > meta 0 kv 1 data 0 > >>> 2018-10-03 10:02:25.249 7f4d54b611c0 1 bdev(0x564072f96000 > /var/lib/ceph/osd/ceph-46/block) close > >>> 2018-10-03 10:02:25.503 7f4d54b611c0 1 > bluestore(/var/lib/ceph/osd/ceph-46) _mount path /var/lib/ceph/osd/ceph-46 > >>> 2018-10-03 10:02:25.504 7f4d54b611c0 1 bdev create path > /var/lib/ceph/osd/ceph-46/block type kernel > >>> 2018-10-03 10:02:25.504 7f4d54b611c0 1 bdev(0x564072f96000 > /var/lib/ceph/osd/ceph-46/block) open path /var/lib/ceph/osd/ceph-46/block > >>> 2018-10-03 10:02:25.504 7f4d54b611c0 1 bdev(0x564072f96000 > /var/lib/ceph/osd/ceph-46/block) open size 1000198897664 (0xe8e0800000, 932 > GiB) block_size 4096 (4 KiB) rotational > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 > bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes kv_min_ratio 1 > > kv_ratio 0.5 > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 > bluestore(/var/lib/ceph/osd/ceph-46) _set_cache_sizes cache_size 536870912 > meta 0 kv 1 data 0 > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 bdev create path > /var/lib/ceph/osd/ceph-46/block type kernel > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 bdev(0x564072f96a80 > /var/lib/ceph/osd/ceph-46/block) open path /var/lib/ceph/osd/ceph-46/block > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 bdev(0x564072f96a80 > /var/lib/ceph/osd/ceph-46/block) open size 1000198897664 (0xe8e0800000, 932 > GiB) block_size 4096 (4 KiB) rotational > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 bluefs add_block_device bdev 1 > path /var/lib/ceph/osd/ceph-46/block size 932 GiB > >>> 2018-10-03 10:02:25.505 7f4d54b611c0 1 bluefs mount > >>> 2018-10-03 10:02:25.620 7f4d54b611c0 -1 bluefs _replay file with link > count 0: file(ino 450 size 0x169964c mtime 2018-10-02 12:24:22.602432 bdev > 1 allocated 1700000 extents > [1:0x6fd9500000+100000,1:0x6fd9600000+100000,1:0x6fd9700000+100000,1:0x6fd9800000+100000,1:0x6fd9900000+100000,1:0x6fd9a00000+100000,1:0x6fd9b00000+100000,1:0x6fd9c00000+100000,1:0x6fd9d00000+100000,1:0x6fd9e00000+100000,1:0x6fd9f00000+100000,1:0x6fda000000+100000,1:0x6fda100000+100000,1:0x6fda200000+100000,1:0x6fda300000+100000,1:0x6fda400000+100000,1:0x6fda500000+100000,1:0x6fda600000+100000,1:0x6fda700000+100000,1:0x6fda800000+100000,1:0x6fda900000+100000,1:0x6fdaa00000+100000,1:0x6fdab00000+100000]) > >>> 2018-10-03 10:02:25.620 7f4d54b611c0 -1 bluefs mount failed to replay > log: (5) Input/output error > >>> 2018-10-03 10:02:25.620 7f4d54b611c0 1 stupidalloc 0x0x564073102fc0 > shutdown > >>> 2018-10-03 10:02:25.620 7f4d54b611c0 -1 > bluestore(/var/lib/ceph/osd/ceph-46) _open_db failed bluefs mount: (5) > Input/output error > >>> 2018-10-03 10:02:25.620 7f4d54b611c0 1 bdev(0x564072f96a80 > /var/lib/ceph/osd/ceph-46/block) close > >>> 2018-10-03 10:02:25.763 7f4d54b611c0 1 bdev(0x564072f96000 > /var/lib/ceph/osd/ceph-46/block) close > >>> 2018-10-03 10:02:26.010 7f4d54b611c0 -1 osd.46 0 OSD:init: unable to > mount object store > >>> 2018-10-03 10:02:26.010 7f4d54b611c0 -1 ** ERROR: osd init failed: > (5) Input/output error > >>> > >>> We had failing disks in this cluster before but that was easily > recovered by out + rebalance. > >>> For me, it seems like one disk died (there was large I/O on the > cluster when this happened) and took two additional disks with it. > >>> It is very strange that this happened about two hours after the > upgrade + reboot. > >>> > >>> Any recommendations? > >>> I have 8 PGs down, the remeining are active and recovery / rebalance. > >>> > >>> Kind regards > >>> Kevin > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> [email protected] > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > > -- > > > > Mit freundlichen Grüßen / best regards, > > Kevin Olbrich. > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
