On 1/31/19 5:18 PM, Andreas Gruenbacher wrote:
Hi Ross,
On Thu, 31 Jan 2019 at 11:56, Ross Lagerwall <ross.lagerw...@citrix.com> wrote:
Each gfs2_bufdata stores a reference to a glock but the reference count
isn't incremented. This causes an occasional use-after-free of the
glock. Fix by taking a reference on the glock during allocation and
dropping it when freeing.
Found by KASAN:
BUG: KASAN: use-after-free in revoke_lo_after_commit+0x8e/0xe0 [gfs2]
Write of size 4 at addr ffff88801aff6134 by task kworker/0:2H/20371
CPU: 0 PID: 20371 Comm: kworker/0:2H Tainted: G O 4.19.0+0 #1
Hardware name: Dell Inc. PowerEdge R805/0D456H, BIOS 4.2.1 04/14/2010
Workqueue: glock_workqueue glock_work_func [gfs2]
Call Trace:
dump_stack+0x71/0xab
print_address_description+0x6a/0x270
kasan_report+0x258/0x380
? revoke_lo_after_commit+0x8e/0xe0 [gfs2]
revoke_lo_after_commit+0x8e/0xe0 [gfs2]
gfs2_log_flush+0x511/0xa70 [gfs2]
? gfs2_log_shutdown+0x1f0/0x1f0 [gfs2]
? __brelse+0x48/0x50
? gfs2_log_commit+0x4de/0x6e0 [gfs2]
? gfs2_trans_end+0x18d/0x340 [gfs2]
gfs2_ail_empty_gl+0x1ab/0x1c0 [gfs2]
? inode_go_dump+0xe0/0xe0 [gfs2]
? inode_go_sync+0xe4/0x220 [gfs2]
inode_go_sync+0xe4/0x220 [gfs2]
do_xmote+0x12b/0x290 [gfs2]
glock_work_func+0x6f/0x160 [gfs2]
process_one_work+0x461/0x790
worker_thread+0x69/0x6b0
? process_one_work+0x790/0x790
kthread+0x1ae/0x1d0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x22/0x40
thanks for tracking this down, very interesting.
The consistency model here is that every buffer head that a struct
gfs2_bufdata object is attached to is protected by a glock. Before a
glock can be released, all the buffers under that glock have to be
flushed out and released; this is what allows another node to access
the same on-disk location without causing inconsistencies. When there
is a bufdata object that points to a glock that has already been
freed, this consistency model is broken. Taking an additional refcount
as this patch does may make the use-after-free go away, but it doesn't
fix the underlying problem. So I think we'll need a different fix
here.
Did you observe this problem in a real-world scenario, or with KASAN
only? It might be that we're looking at a small race that is unlikely
to trigger in the field. In any case, I think we need to understand
better what't actually going on.
I finally got time to look into this further. The following is what
happens as far as I can tell though my understanding is a bit limited at
this point:
1. A file is opened, truncated and closed which results in a metadata
write so a gfs2_bufdata object is created and associated with the inode
glock.
2. The gfs2_bufdata is written out and placed on an AIL list.
3. The VFS layer calls gfs2_evict_inode() and the inode is evicted which
drops a reference on the glock. It takes case 3 (i_nlink > 0) so no log
flush or ail_flush happens.
4. The gfs2_bufdata object is however still on one of the AIL lists and
the next gfs2_log_flush() begins to write a revoke entry.
5. At about the same time the glock is freed. At the point of freeing,
gl_revokes is 1 (should probably have a GLOCK_BUG_ON for this).
6. gfs2_log_flush() continues and calls revoke_lo_after_commit() and
uses the freed glock (stack trace above).
Should evict_inode call gfs2_ail_flush() or something so that the revoke
is written before it drops its reference to the glock?
Or is there something else that is meant to keep the glock around if the
inode is evicted while there is a linked gfs2_bufdata sitting on some
AIL list?
Let me know if any more specific information is needed since I now have
a test setup that can usually reproduce this within 10 minutes.
Thanks,
--
Ross Lagerwall