Re: [DRBD-user] disk-timeout actually in deciseconds?

Felix Zachlod Wed, 26 Nov 2014 07:15:43 -0800

Am 26.11.2014 14:27, schrieb Lars Ellenberg:


DRBD "logging" is simply a printk.
Whether or not that makes it to stable storage via some syslog channel
or not is no longer in control of DRBD.
Especially if the storage in fact *did* have problems, I think it is
very unlikely that any logging would have made it to disk on that box...

I don't think the storage ACTUALLY had a problem besides possibly beingunder high load. At least I cannot tell that anything was bad from theraid controller or kernel logs. Besides that as I said the syslog is ona separate disk subsystem, presented by a different controller, using adifferent driver, so I assume even if some raid controller or disksubsystem is having a problem it should still always be possible to logto syslog as long as the system has not crashed.

Also: the disk-timeout option is *dangerous* and *may lead to kernel
panic*.  So don't use it (unless you are *very* certain that you know
what you are doing, and have a very good reason to do it).


I read that before and my intent is the following:

If a disk subsystem on the master is neither reacting nor throwing i/oerrors the master role should be transfered to the peer no matter what.So I would be accepting a kernel panic occuring in such situation ratherthan waiting forever for a non reacting disk subsystem which would beless acceptable in my opinion.

The problem in this situation was that I prepared the drbd config for acluster manager installed and properly configured to do all that but infact I did not have enough time in the last maintenance time window toapply my cluster configuration, for other problems that occured.

So in this situation the disk-timeout does not make sense as I risk thesystem crashing here and noone taking over. So I removed thedisk-timeout setting now but still intend to use it later when thecluster manager is in place.

But I still have to monitor this behaviour again in my test-setup tomake sure I never reach a disk-timeout situation in normal workingconditions, but as far as I can tell from my munin logs and watchingiostat under high load it should never be the case that a volume isinresponsive for more than 30s, at least as long as it does not ACTUALLYhave a serious problem.

Of course there may be bugs in our code, so if you should be able to
reproduce "misbehaviour", let us know.

I will do testing with this again in my lab to see under whichconditions the disk-timeout might be reached. Thank you for commenting.


regards, Felix
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] disk-timeout actually in deciseconds?

Reply via email to