This bug is missing log files that will aid in diagnosing the problem.
While running an Ubuntu kernel (not a mainline or third-party kernel)
please enter the following command in a terminal window:
apport-collect 1744300
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable
to run this command, please add a comment stating that fact and change
the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the
Ubuntu Kernel Team.
** Changed in: linux (Ubuntu)
Status: New => Incomplete
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1744300
Title:
bt_iter() crash due to NULL pointer
Status in linux package in Ubuntu:
Incomplete
Bug description:
SRU Justification:
[Impact]
The following crash was observed in Ubuntu 16.04 running linux-gcp kernel
version 4.13 (specifically 4.13.0-1006.9):
[ 10.972644] BUG: unable to handle kernel NULL pointer dereference at
0000000000000030
[ 10.980708] IP: bt_iter+0x31/0x50
[ 10.984310] PGD 0
[ 10.984310] P4D 0
[ 10.986439]
[ 10.990190] Oops: 0000 [#1] SMP PTI
[ 11.016282] Workqueue: kblockd blk_mq_timeout_work
[ 11.021196] task: ffff8e7c2e700000 task.stack: ffffb8d4c67a8000
[ 11.027234] RIP: 0010:bt_iter+0x31/0x50
[ 11.031187] RSP: 0018:ffffb8d4c67abda0 EFLAGS: 00010206
[ 11.037730] RAX: ffffb8d4c67abdd0 RBX: 0000000000000180 RCX:
0000000000000000
[ 11.045172] RDX: ffff8e7c34c8d280 RSI: 0000000000000000 RDI:
ffff8e7c32dd8000
[ 11.053321] RBP: ffffb8d4c67abe20 R08: 0000000000000000 R09:
0000200000000100
[ 11.060582] R10: 0000000000000130 R11: 00000000fffee5bf R12:
ffff8e7c3572c790
[ 11.068094] R13: ffff8e7c3572c780 R14: 0000000000000008 R15:
ffff8e7c35e7c180
[ 11.075522] FS: 0000000000000000(0000) GS:ffff8e7c3a4c0000(0000)
knlGS:0000000000000000
[ 11.083721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.089593] CR2: 0000000000000030 CR3: 000000009e20a003 CR4:
00000000001606e0
[ 11.096871] Call Trace:
[ 11.099468] ? blk_mq_queue_tag_busy_iter+0xe2/0x1f0
[ 11.104558] ? blk_mq_rq_timed_out+0x70/0x70
[ 11.109130] ? blk_mq_rq_timed_out+0x70/0x70
[ 11.114933] blk_mq_timeout_work+0xbb/0x170
[ 11.119408] process_one_work+0x156/0x410
[ 11.123641] worker_thread+0x4b/0x460
[ 11.127827] kthread+0x109/0x140
[ 11.131186] ? process_one_work+0x410/0x410
[ 11.135499] ? kthread_create_on_node+0x70/0x70
[ 11.140408] ret_from_fork+0x1f/0x30
[ 11.144110] Code: 89 d0 48 8b 3a 0f b6 48 18 48 8b 97 30 01 00 00 84 c9 75
03 03 72 04 48 8b 92 80 00 00 00 89 f6 48 8b 34 f2 48 8b 97 c0 00 00 00 <48> 39
56 30 74 06 b8 01 00 00 00 c3 55 48 8b 50 10 48 89 e5 ff
[ 11.167573] RIP: bt_iter+0x31/0x50 RSP: ffffb8d4c67abda0
[ 11.173028] CR2: 0000000000000030
[ 11.176515] ---[ end trace 2f8e5b1cf4139fec ]---
[ 11.182589] Kernel panic - not syncing: Fatal exception
Basically, we have a NULL pointer dereference while in bt_iter()
function - this is caused because after the merge of blk-mq scheduler
capability on Linux kernel , tags->rqs[] array has been dinamically
assigned and there's a small window of time in which the bit is set
but tags->rqs[] array wasn't allocated yet. This was reported to
happen in about 5% of test runs (more details on test section).
[Fix]
The fix is small and simple, and it's upstream already. Basically, it adds a
NULL pointer check on bt_iter() and bt_tags_iter() functions.
The fix is: 7f5562d5ecc4 ("blk-mq-tag: check for NULL rq when iterating
tags"), by Jens Axboe.
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7f5562d5ecc4)
[Testcase]
Since the problem manifests in a small non-deterministic time window, there's
no easy test to reproduce this. In our case, it was observed while testing a
large number of CPU's and attached disks (>200 disks, >150 cores), trying to
exercise all CPUs and disks (the disks with quick dd commands). In this test
scenario, as already mentioned, issue occured in about 5% of the runs.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1744300/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp