Looks like a known issue tracked by

http://tracker.ceph.com/issues/24423

http://tracker.ceph.com/issues/24599


Regards,

Igor


On 6/27/2018 9:40 AM, Steffen Winther Sørensen wrote:
List,

Had a failed disk behind an OSD in a Mimic Cluster 13.2.0, so I tried following the doc on removal of an OSD.

I did:

# ceph osd crush reweight osd.19 0
waited for rebalancing to finish and cont.:
# ceph osd out 19
# systemctl stop ceph-osd@19
# ceph osd purge 19 --yes-i-really-mean-it

verified that osd.19 was out of map w/ ceph osd tree

Still found this tmpfs mounted though to my surprise:
tmpfs                    7.8G   48K  7.8G   1% /var/lib/ceph/osd/ceph-19

Replaced the failed drive and then attempted:

# ceph-volume lvm zap /dev/sdh
# ceph-volume lvm create --osd-id 19 --data /dev/sdh
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 5352d594-aa19-4147-a884-ca
2c5775aa1b
Running command: /usr/sbin/vgcreate --force --yes ceph-a2ebf47b-fa4a-43ce-b087-1
2dbafb5796e /dev/sdh
 stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
 or rejected by a filter.
 stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
 or rejected by a filter.
 stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934- 408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 0: Input/output error
/dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-408e-a4
f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775408640: Input/output error
/dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-408e-a4
f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775465984: Input/output error  stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934- 408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 4096: Input/output error  stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
 or rejected by a filter.
 stdout: Physical volume "/dev/sdh" successfully created.
 stdout: Volume group "ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e" successfully created Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-5352d594-aa19
-4147-a884-ca2c5775aa1b ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e
 stdout: Logical volume "osd-block-5352d594-aa19-4147-a884-ca2c5775aa1b" created
.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-19
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/ln -s /dev/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-b
lock-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-19/activate.monmap
 stderr: got monmap epoch 1
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-19/keyring --create-k
eyring --name osd.19 --add-key AQBY1TBbN8I+HxAAMHGWKLgJugmtzdqllQh5sA==
 stdout: creating /var/lib/ceph/osd/ceph-19/keyring
 stdout: added entity osd.19 auth auth(auid = 18446744073709551615 key=AQBY1TBbN
8I+HxAAMHGWKLgJugmtzdqllQh5sA== with 0 caps)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs  -i 19 --monmap /var/lib/ceph/osd/ceph-19/activate.monmap --keyfile - --osd-data  /var/lib/ceph/osd/ceph-19/ --osd-uuid 5352d594-aa19-4147-a884-ca2c5775aa1b --se
tuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sdh
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /de
v/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-block-5352d594-aa19-4147-a884-ca
2c5775aa1b --path /var/lib/ceph/osd/ceph-19
Running command: /bin/ln -snf /dev/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd -block-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
Running command: /bin/systemctl enable ceph-volume@lvm-19-5352d594-aa19-4147-a88
4-ca2c5775aa1b
 stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-v [email protected] <mailto:[email protected]> to /usr/lib/systemd/sy
stem/[email protected].
Running command: /bin/systemctl start ceph-osd@19
--> ceph-volume lvm activate successful for osd ID: 19
--> ceph-volume lvm create successful for: /dev/sdh

verified that osd.19 was in the map with:
# ceph osd tree
ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       3.20398 root default
-9       0.80099     host n1
18   hdd 0.13350         osd.18     up  1.00000 1.00000
19   hdd 0.13350         osd.19   down        0 1.00000
20   hdd 0.13350         osd.20     up  1.00000 1.00000
21   hdd 0.13350         osd.21     up  1.00000 1.00000
22   hdd 0.13350         osd.22     up  1.00000 1.00000
23   hdd 0.13350         osd.23     up  1.00000 1.00000

Only it fails to launch
# systemctl start ceph-osd@19
# systemctl status ceph-osd@19
â [email protected] <mailto:[email protected]> - Ceph object storage daemon osd.19    Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; vendor preset: disabled)    Active: activating (auto-restart) (Result: signal) since Mon 2018-06-25 13:44:35 CEST; 3s ago   Process: 2046453 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)   Process: 2046447 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 2046453 (code=killed, signal=ABRT)

Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 8: (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x56353eac71f0] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 9: (OSD::_dispatch(Message*)+0xa1) [0x56353eac9d21] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 10: (OSD::ms_dispatch(Message*)+0x56) [0x56353eaca066] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 11: (DispatchQueue::entry()+0xb5a) [0x7f302acce74a] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f302ad6ef2d] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 13: (()+0x7e25) [0x7f30277b0e25] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: 14: (clone()+0x6d) [0x7f30268a1bad] Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> ceph-osd[2046453]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> systemd[1]: Unit [email protected] <mailto:[email protected]> entered failed state. Jun 25 13:44:35 n1.sprawl.dk <http://n1.sprawl.dk> systemd[1]: [email protected] <mailto:[email protected]> failed.

osd.19 log show:

--- begin dump of recent events ---
     0> 2018-06-25 13:48:47.139 7fc6b91c5700 -1 *** Caught signal (Aborted) **
 in thread 7fc6b91c5700 thread_name:ms_dispatch

 ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
 1: (()+0x8e1870) [0x55da2ff6e870]
 2: (()+0xf6d0) [0x7fc6c97ba6d0]
 3: (gsignal()+0x37) [0x7fc6c87db277]
 4: (abort()+0x148) [0x7fc6c87dc968]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x25d) [0x7fc6ccc5a69d]
 6: (()+0x286727) [0x7fc6ccc5a727]
 7: (OSDService::get_map(unsigned int)+0x4a) [0x55da2faa3dda]
 8: (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x55da2fa511f0]
 9: (OSD::_dispatch(Message*)+0xa1) [0x55da2fa53d21]
 10: (OSD::ms_dispatch(Message*)+0x56) [0x55da2fa54066]
 11: (DispatchQueue::entry()+0xb5a) [0x7fc6cccd074a]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fc6ccd70f2d]
 13: (()+0x7e25) [0x7fc6c97b2e25]
 14: (clone()+0x6d) [0x7fc6c88a3bad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Any hints would be appreciated, TIA!

/Steffen


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to