Re: osd shutdown notification

Greg Farnum Mon, 26 Mar 2012 12:46:09 -0700

On Monday, March 26, 2012 at 12:36 PM, Sage Weil wrote:
> Currently when you shutdown/kill a ceph-osd it is no different from it 
> crashing: you have to wait N seconds for its peers to conclude the process 
> is down before the OSD is deemed 'failed' and the osd map is updated.
> 
> This would be pretty easy to improve on:
> 
> - on a clean shutdown (e.g., due to SIGTERM), we could execv a call to 
> the ceph tool to tell the monitors the osd stopped (maybe with a 
> 'reason' and nice log message).
> 
> - on an unclean shutdown (e.g., failed assert, segfault) we can
> do the same, with an appropriate message in the system log
> 
> Basically it means that is ceph-osd crashes or shuts down then it 
> will normally get instantly marked down without waiting for the normal osd 
> timeout to expire.
> 
> execv() is kind of ugly, but seems safer in the failure cases, where you 
> can't trust the existing MonClient to be operational. 
> 
> Alternatively, some external wrapper could watch for the process to 
> terminate and notify the cluster, but this would be a bit more difficult 
> to implement, because that notification needs to uniquely identify the 
> process instance (e.g., via the cluster addr), and we'd need some way for 
> it to wait for the osd to join and then extract that id, etc.
> 
> Thoughts?
execve to an external binary seems like the wrong tool for this job. On clean 
shutdown the OSD can send off a notification itself; somehow handling failures 
seems like a job for the monitoring service, not Ceph itself.
Doing it this way would also complicate cephx key management, since you either 
need an extra "osd-notifier" key added to each OSD node, or to give each OSD 
key modification privileges on the monitor.



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: osd shutdown notification

Reply via email to