Hi,

This regression is fairly complicated and it's high impact, as mptsas is being used to drive fairly popular controllers, including the entry-level ones in several generations of Dell PowerEdge servers.

We've been debugging this for a while now over at Ubuntu's Launchpad[1] and the issue has been subsequently been raised on both the linux-scsi[2] & systemd mailing lists[3].

In essence, there are four different behaviors/bugs here:

1) The kthread_create() semantics have changed in 3.13 with 786235ee by making kthreads killable. Not a bug on its own, but it's a "breaks previously working userspace configuration" kind of bug. Ubuntu has reverted this patch for trusty as a workaround.

2) mptsas, to probe the SAS bus, spawns a kthread that takes more than 30s to complete. The consensus on the list AIUI is that it's a bug and it should not take that long.

3) systemd-udev by default sends SIGKILL to kthreads that have been running for more than 30s. systemd developers do not consider this a bug but an intended behavior and refuse to fix this issue. Adding "OPTIONS+="event_timeout=120" to the udev config would probably workaround this.

4) Unrelated to the bug at hand, mptsas is buggy in the error handling codepath, when the kthread spawning fails. It tries to clean up by dereferencing a NULL pointer and hence the kernel oopses, while otherwise it'd just continue running, just without any mptsas devices present. I've made an analysis of the buggy codepath on comment #27 on the LP bug above. This has always been a bug, it's just that that codepath was untested until now.

The end result is that this regression is somewhere in the limbo land between kernel/systemd for the two features (1)/(2) that are valid on their own but reveal a regression in combination with (3) and each other.

Issue (2) seems like a real bug and the root cause here, but one that probably can't be easily fixed in a point release -- I don't think it hasn't even been fixed in master yet.

Issue (4) is easily fixable but it's orthogonal and not going to solve the real problem here. It will just downgrade this from an oops to "just" a system with no disk drives but an otherwise working kernel.

Regards,
Faidon

1: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705
2: https://lkml.org/lkml/2014/3/23/42
3: http://lists.freedesktop.org/archives/systemd-devel/2014-March/018007.html


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to