List,
Had a failed disk behind an OSD in a Mimic Cluster 13.2.0, so I tried following
the doc on removal of an OSD.
I did:
# ceph osd crush reweight osd.19 0
waited for rebalancing to finish and cont.:
# ceph osd out 19
# systemctl stop ceph-osd@19
# ceph osd purge 19 --yes-i-really-mean-it
verified that osd.19 was out of map w/ ceph osd tree
Still found this tmpfs mounted though to my surprise:
tmpfs 7.8G 48K 7.8G 1% /var/lib/ceph/osd/ceph-19
Replaced the failed drive and then attempted:
# ceph-volume lvm zap /dev/sdh
# ceph-volume lvm create --osd-id 19 --data /dev/sdh
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 5352d594-aa19-4147-a884-ca
2c5775aa1b
Running command: /usr/sbin/vgcreate --force --yes ceph-a2ebf47b-fa4a-43ce-b087-1
2dbafb5796e /dev/sdh
stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
or rejected by a filter.
stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
or rejected by a filter.
stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-
408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 0: Input/output error
/dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-408e-a4
f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775408640: Input/output error
/dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-408e-a4
f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775465984: Input/output error
stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-
408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 4096: Input/output error
stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
or rejected by a filter.
stdout: Physical volume "/dev/sdh" successfully created.
stdout: Volume group "ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e" successfully
created
Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-5352d594-aa19
-4147-a884-ca2c5775aa1b ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e
stdout: Logical volume "osd-block-5352d594-aa19-4147-a884-ca2c5775aa1b" created
.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-19
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/ln -s /dev/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-b
lock-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
/var/lib/ceph/osd/ceph-19/activate.monmap
stderr: got monmap epoch 1
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-19/keyring --create-k
eyring --name osd.19 --add-key AQBY1TBbN8I+HxAAMHGWKLgJugmtzdqllQh5sA==
stdout: creating /var/lib/ceph/osd/ceph-19/keyring
stdout: added entity osd.19 auth auth(auid = 18446744073709551615 key=AQBY1TBbN
8I+HxAAMHGWKLgJugmtzdqllQh5sA== with 0 caps)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs
-i 19 --monmap /var/lib/ceph/osd/ceph-19/activate.monmap --keyfile - --osd-data
/var/lib/ceph/osd/ceph-19/ --osd-uuid 5352d594-aa19-4147-a884-ca2c5775aa1b --se
tuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sdh
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /de
v/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-block-5352d594-aa19-4147-a884-ca
2c5775aa1b --path /var/lib/ceph/osd/ceph-19
Running command: /bin/ln -snf /dev/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd
-block-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
Running command: /bin/systemctl enable ceph-volume@lvm-19-5352d594-aa19-4147-a88
4-ca2c5775aa1b
stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-v
[email protected]
<mailto:[email protected]> to
/usr/lib/systemd/sy
stem/[email protected].
Running command: /bin/systemctl start ceph-osd@19
--> ceph-volume lvm activate successful for osd ID: 19
--> ceph-volume lvm create successful for: /dev/sdh
verified that osd.19 was in the map with:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 3.20398 root default
-9 0.80099 host n1
18 hdd 0.13350 osd.18 up 1.00000 1.00000
19 hdd 0.13350 osd.19 down 0 1.00000
20 hdd 0.13350 osd.20 up 1.00000 1.00000
21 hdd 0.13350 osd.21 up 1.00000 1.00000
22 hdd 0.13350 osd.22 up 1.00000 1.00000
23 hdd 0.13350 osd.23 up 1.00000 1.00000
Only it fails to launch
# systemctl start ceph-osd@19
# systemctl status ceph-osd@19
â [email protected] <mailto:[email protected]> - Ceph object storage daemon
osd.19
Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; vendor
preset: disabled)
Active: activating (auto-restart) (Result: signal) since Mon 2018-06-25
13:44:35 CEST; 3s ago
Process: 2046453 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i
--setuser ceph --setgroup ceph (code=killed, signal=ABRT)
Process: 2046447 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster
${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 2046453 (code=killed, signal=ABRT)
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 8:
(OSD::handle_osd_map(MOSDMap*)+0x1020) [0x56353eac71f0]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 9:
(OSD::_dispatch(Message*)+0xa1) [0x56353eac9d21]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 10:
(OSD::ms_dispatch(Message*)+0x56) [0x56353eaca066]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 11:
(DispatchQueue::entry()+0xb5a) [0x7f302acce74a]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 12:
(DispatchQueue::DispatchThread::entry()+0xd) [0x7f302ad6ef2d]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 13:
(()+0x7e25) [0x7f30277b0e25]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: 14:
(clone()+0x6d) [0x7f30268a1bad]
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> ceph-osd[2046453]: NOTE: a
copy of the executable, or `objdump -rdS <executable>` is needed to interpret
this.
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> systemd[1]: Unit
[email protected] <mailto:[email protected]> entered failed state.
Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk/> systemd[1]:
[email protected] <mailto:[email protected]> failed.
osd.19 log show:
--- begin dump of recent events ---
0> 2018-06-25 13:48:47.139 7fc6b91c5700 -1 *** Caught signal (Aborted) **
in thread 7fc6b91c5700 thread_name:ms_dispatch
ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
1: (()+0x8e1870) [0x55da2ff6e870]
2: (()+0xf6d0) [0x7fc6c97ba6d0]
3: (gsignal()+0x37) [0x7fc6c87db277]
4: (abort()+0x148) [0x7fc6c87dc968]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x25d) [0x7fc6ccc5a69d]
6: (()+0x286727) [0x7fc6ccc5a727]
7: (OSDService::get_map(unsigned int)+0x4a) [0x55da2faa3dda]
8: (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x55da2fa511f0]
9: (OSD::_dispatch(Message*)+0xa1) [0x55da2fa53d21]
10: (OSD::ms_dispatch(Message*)+0x56) [0x55da2fa54066]
11: (DispatchQueue::entry()+0xb5a) [0x7fc6cccd074a]
12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fc6ccd70f2d]
13: (()+0x7e25) [0x7fc6c97b2e25]
14: (clone()+0x6d) [0x7fc6c88a3bad]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
Any hints would be appreciated, TIA!
/Steffen_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com