Re: [ceph-users] ceph-osd restartd via systemd in case of disk error

Stanley Zhang Tue, 19 Sep 2017 14:26:14 -0700

I like this, there is some similar ideas we probably can borrow fromCassandra on disk failure

# policy for data disk failures:
# die: shut down gossip and Thrift and kill the JVM for any fs errors or
#      single-sstable errors, so the node can be replaced.
# stop_paranoid: shut down gossip and Thrift even for single-sstableerrors.# stop: shut down gossip and Thrift, leaving the node effectivelydead, but
#       can still be inspected via JMX.
# best_effort: stop using the failed disk and respond to requests based on
# remaining available sstables. This means you WILL seeobsolete
#              data at CL.ONE!
# ignore: ignore fatal errors and let requests fail, as in pre-1.2Cassandra
disk_failure_policy: stop_paranoid

Regards


Stanley


On 19/09/17 9:16 PM, Manuel Lausch wrote:

Am Tue, 19 Sep 2017 08:24:48 +0000
schrieb Adrian Saul <[email protected]>:

I understand what you mean and it's indeed dangerous, but see:
https://github.com/ceph/ceph/blob/master/systemd/ceph-osd%40.service

Looking at the systemd docs it's difficult though:
https://www.freedesktop.org/software/systemd/man/systemd.service.ht
ml

If the OSD crashes due to another bug you do want it to restart.

But for systemd it's not possible to see if the crash was due to a
disk I/O- error or a bug in the OSD itself or maybe the OOM-killer
or something.

Perhaps using something like RestartPreventExitStatus and defining a
specific exit code for the OSD to exit on when it is exiting due to
an IO error.

A other idea: The OSD daemon keeps running in a defined error state
and only stops the listeners with other OSDs and the clients.


--

*Stanley Zhang | * Senior Operations Engineer
*Telephone:* +64 9 302 0515 *Fax:* +64 9 302 0518
*Mobile:* +64 22 318 3664 *Freephone:* 0800 SMX SMX (769 769)
*SMX Limited:* Level 15, 19 Victoria Street West, Auckland, New Zealand
*Web:* http://smxemail.com
SMX | Cloud Email Hosting & Security

_____________________________________________________________________________

This email has been filtered by SMX. For more info visit http://smxemail.com
_____________________________________________________________________________

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-osd restartd via systemd in case of disk error

Reply via email to