On 10/11/14 09:16 AM, Lars Ellenberg wrote:
On Mon, Nov 10, 2014 at 09:00:45AM -0500, Digimer wrote:
On 10/11/14 04:11 AM, Lars Ellenberg wrote:
On Sun, Nov 09, 2014 at 04:05:52PM -0500, Digimer wrote:
CentOS 6.6, DRBD 8.3.16.
So this sucked:
After rebooting and restoring, I retried and got the same result a
second time. After moving my VMs to the other node, I tested
crashing the other node and again saw the "out of mem, failed to
invoke fence-peer helper" message. After that, I rebooted both
nodes. I've not yet tested if that resolved the issue.
Anyone seen this before?
*** Nov 9 15:18:40 fea-c01n01 kernel: block drbd0: out of mem,
failed to invoke fence-peer helper
Sure.
Your kernel is too new for this DRBD.
Your DRBD is too old for this kernel.
As you know, we sometimes start some "handlers".
We spawn new kernel threads for this.
One of the relevant functions is kthread_run
(and everything it calles).
That used to fail only for hard out of memory conditions.
(Thus the "nonsense" error message)
At some point, upstream kernel changed the internals
of that code path to no longer do a wait_for_completion(),
but to do a wait_for_completion_killable().
Nov 9 15:21:16 fea-c01n01 kernel: Not tainted 2.6.32-504.el6.x86_64 #1
And apparently RHEL 6.6. has backported that change.
Which means that now this can also fail because of pending signals.
DRBD routinely may have a signal pending in the calling thread there.
Upstream fix:
http://git.linbit.com/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=e998365475194a8faf31a86081e88034d7bd1a41
Another list user emailed me off list pointing to that fix as well.
Problem is, it doesn't match the 8.3.16 source I have...
So?
There is *exactly* one occurrence of kthread_run in the drbd source,
and the patch consists of *exactly* one non-comment line,
which is "+ flush_signals(current);"
;-)
Besides: time to finally get rid of 8.3, then.
===
void drbd_try_outdate_peer_async(struct drbd_conf *mdev)
{
struct task_struct *opa;
opa = kthread_run(_try_outdate_peer_async, mdev,
"drbd%d_a_helper", mdev_to_minor(mdev));
if (IS_ERR(opa))
dev_err(DEV, "out of mem, failed to invoke
fence-peer helper\n");
}
===
Can I simply add the two missing lines?:
===
void drbd_try_outdate_peer_async(struct drbd_conf *mdev)
{
struct task_struct *opa;
kref_get(&connection->kref);
Don't add a kref get; that kref does not exist in 8.3 code.
(It is also only _context_ line in the patch)
flush_signals(current);
opa = kthread_run(_try_outdate_peer_async, mdev,
"drbd%d_a_helper", mdev_to_minor(mdev));
if (IS_ERR(opa))
dev_err(DEV, "out of mem, failed to invoke
fence-peer helper\n");
}
I knew it was context only, but it didn't match what was there so I
wanted to clarify.
So to summarize, I only add:
====
flush_signals(current);
====
I'll brush off my old RPM notes and see if I can sort out a patch. Thanks!
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user