Public bug reported:
SRU Justification
-----------------
[Impact]
Certain sequences of file system operations on a cephfs volume backed by
fscache with an ext4 store can cause a kernel BUG:
[ 5818.932770] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[ 5818.934354] IP: jbd2__journal_start+0x33/0x1e0
...
[ 5818.962490] Call Trace:
[ 5818.963055] ? ext4_writepages+0x5d5/0xf40
[ 5818.963884] __ext4_journal_start_sb+0x6d/0x120
[ 5818.964994] ext4_writepages+0x5d5/0xf40
[ 5818.965991] ? __enqueue_entity+0x5c/0x60
[ 5818.966791] ? check_preempt_wakeup+0x130/0x240
[ 5818.967679] do_writepages+0x4b/0xe0
[ 5818.968625] ? ext4_mark_inode_dirty+0x1d0/0x1d0
[ 5818.969526] ? do_writepages+0x4b/0xe0
[ 5818.970493] ? ext4_statfs+0x114/0x260
[ 5818.971267] __filemap_fdatawrite_range+0xc1/0x100
[ 5818.972425] ? __filemap_fdatawrite_range+0xc1/0x100
[ 5818.973385] filemap_write_and_wait+0x31/0x90
[ 5818.974461] ext4_bmap+0x8c/0xe0
[ 5818.975150] cachefiles_read_or_alloc_pages+0x1bf/0xd90 [cachefiles]
[ 5818.976718] ? _cond_resched+0x19/0x40
[ 5818.977482] ? wake_up_bit+0x42/0x50
[ 5818.978227] ? fscache_run_op.isra.8+0x4c/0x80 [fscache]
[ 5818.979249] __fscache_read_or_alloc_pages+0x1d3/0x2e0 [fscache]
[ 5818.980397] ceph_readpages_from_fscache+0x6c/0xe0 [ceph]
[ 5818.981630] ceph_readpages+0x49/0x100 [ceph]
[ 5818.982691] __do_page_cache_readahead+0x1c9/0x2c0
[ 5818.983628] ? __cap_is_valid+0x21/0xb0 [ceph]
[ 5818.984526] ondemand_readahead+0x11a/0x2a0
[ 5818.985374] ? ondemand_readahead+0x11a/0x2a0
[ 5818.986825] page_cache_async_readahead+0x71/0x80
[ 5818.987751] generic_file_read_iter+0x784/0xbf0
[ 5818.988663] ? ceph_put_cap_refs+0x1c4/0x330 [ceph]
[ 5818.989620] ? page_cache_tree_insert+0xe0/0xe0
[ 5818.990519] ceph_read_iter+0x106/0x820 [ceph]
[ 5818.991818] new_sync_read+0xe4/0x130
[ 5818.992588] __vfs_read+0x29/0x40
[ 5818.993504] vfs_read+0x8e/0x130
[ 5818.994192] SyS_read+0x55/0xc0
[ 5818.994870] do_syscall_64+0x73/0x130
[ 5818.995632] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Fix]
Cherry-pick 5d988308283ecf062fa88f20ae05c52cce0bcdca from upstream.
This patch stops cephfs from reusing current->journal for its own
internal use, which means that it's valid when ext4 uses it via fscache.
[Testcase]
A user has been using the following test case:
( cat /proc/fs/fscache/stats > ~/test.log; i=0; while true; do
touch small; echo 3 > /proc/sys/vm/drop_caches & md5sum small; let "i++";
if ! (( $i % 1000 )); then
echo "Test iteration $i done" >> ~/test.log; cat /proc/fs/fscache/stats
>> ~/test.log;
fi;
done ) > ~/nohup.out 2>&1
(It boils down to "touch file; drop caches; read file")
Without the patch, this fails very quickly - usually the first time, always
within a few iterations. With the patch, the user ran this loop for over 60
hours without incident.
[Regression potential]
The change is not trivial, but is limited to cephfs, and has been in mainline
since v4.16. So the risk of regression is well contained.
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Daniel Axtens (daxtens)
Status: Confirmed
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1783246
Title:
Cephfs + fscache: unable to handle kernel NULL pointer dereference at
0000000000000000 IP: jbd2__journal_start+0x22/0x1f0
Status in linux package in Ubuntu:
Confirmed
Bug description:
SRU Justification
-----------------
[Impact]
Certain sequences of file system operations on a cephfs volume backed by
fscache with an ext4 store can cause a kernel BUG:
[ 5818.932770] BUG: unable to handle kernel NULL pointer dereference at
0000000000000000
[ 5818.934354] IP: jbd2__journal_start+0x33/0x1e0
...
[ 5818.962490] Call Trace:
[ 5818.963055] ? ext4_writepages+0x5d5/0xf40
[ 5818.963884] __ext4_journal_start_sb+0x6d/0x120
[ 5818.964994] ext4_writepages+0x5d5/0xf40
[ 5818.965991] ? __enqueue_entity+0x5c/0x60
[ 5818.966791] ? check_preempt_wakeup+0x130/0x240
[ 5818.967679] do_writepages+0x4b/0xe0
[ 5818.968625] ? ext4_mark_inode_dirty+0x1d0/0x1d0
[ 5818.969526] ? do_writepages+0x4b/0xe0
[ 5818.970493] ? ext4_statfs+0x114/0x260
[ 5818.971267] __filemap_fdatawrite_range+0xc1/0x100
[ 5818.972425] ? __filemap_fdatawrite_range+0xc1/0x100
[ 5818.973385] filemap_write_and_wait+0x31/0x90
[ 5818.974461] ext4_bmap+0x8c/0xe0
[ 5818.975150] cachefiles_read_or_alloc_pages+0x1bf/0xd90 [cachefiles]
[ 5818.976718] ? _cond_resched+0x19/0x40
[ 5818.977482] ? wake_up_bit+0x42/0x50
[ 5818.978227] ? fscache_run_op.isra.8+0x4c/0x80 [fscache]
[ 5818.979249] __fscache_read_or_alloc_pages+0x1d3/0x2e0 [fscache]
[ 5818.980397] ceph_readpages_from_fscache+0x6c/0xe0 [ceph]
[ 5818.981630] ceph_readpages+0x49/0x100 [ceph]
[ 5818.982691] __do_page_cache_readahead+0x1c9/0x2c0
[ 5818.983628] ? __cap_is_valid+0x21/0xb0 [ceph]
[ 5818.984526] ondemand_readahead+0x11a/0x2a0
[ 5818.985374] ? ondemand_readahead+0x11a/0x2a0
[ 5818.986825] page_cache_async_readahead+0x71/0x80
[ 5818.987751] generic_file_read_iter+0x784/0xbf0
[ 5818.988663] ? ceph_put_cap_refs+0x1c4/0x330 [ceph]
[ 5818.989620] ? page_cache_tree_insert+0xe0/0xe0
[ 5818.990519] ceph_read_iter+0x106/0x820 [ceph]
[ 5818.991818] new_sync_read+0xe4/0x130
[ 5818.992588] __vfs_read+0x29/0x40
[ 5818.993504] vfs_read+0x8e/0x130
[ 5818.994192] SyS_read+0x55/0xc0
[ 5818.994870] do_syscall_64+0x73/0x130
[ 5818.995632] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Fix]
Cherry-pick 5d988308283ecf062fa88f20ae05c52cce0bcdca from upstream.
This patch stops cephfs from reusing current->journal for its own
internal use, which means that it's valid when ext4 uses it via
fscache.
[Testcase]
A user has been using the following test case:
( cat /proc/fs/fscache/stats > ~/test.log; i=0; while true; do
touch small; echo 3 > /proc/sys/vm/drop_caches & md5sum small; let "i++";
if ! (( $i % 1000 )); then
echo "Test iteration $i done" >> ~/test.log; cat
/proc/fs/fscache/stats >> ~/test.log;
fi;
done ) > ~/nohup.out 2>&1
(It boils down to "touch file; drop caches; read file")
Without the patch, this fails very quickly - usually the first time, always
within a few iterations. With the patch, the user ran this loop for over 60
hours without incident.
[Regression potential]
The change is not trivial, but is limited to cephfs, and has been in mainline
since v4.16. So the risk of regression is well contained.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1783246/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp