On 1/31/19 5:18 PM, Andreas Gruenbacher wrote:
Hi Ross,

On Thu, 31 Jan 2019 at 11:56, Ross Lagerwall <ross.lagerw...@citrix.com> wrote:
Each gfs2_bufdata stores a reference to a glock but the reference count
isn't incremented. This causes an occasional use-after-free of the
glock. Fix by taking a reference on the glock during allocation and
dropping it when freeing.

Found by KASAN:

BUG: KASAN: use-after-free in revoke_lo_after_commit+0x8e/0xe0 [gfs2]
Write of size 4 at addr ffff88801aff6134 by task kworker/0:2H/20371

CPU: 0 PID: 20371 Comm: kworker/0:2H Tainted: G O 4.19.0+0 #1
Hardware name: Dell Inc. PowerEdge R805/0D456H, BIOS 4.2.1 04/14/2010
Workqueue: glock_workqueue glock_work_func [gfs2]
Call Trace:
  dump_stack+0x71/0xab
  print_address_description+0x6a/0x270
  kasan_report+0x258/0x380
  ? revoke_lo_after_commit+0x8e/0xe0 [gfs2]
  revoke_lo_after_commit+0x8e/0xe0 [gfs2]
  gfs2_log_flush+0x511/0xa70 [gfs2]
  ? gfs2_log_shutdown+0x1f0/0x1f0 [gfs2]
  ? __brelse+0x48/0x50
  ? gfs2_log_commit+0x4de/0x6e0 [gfs2]
  ? gfs2_trans_end+0x18d/0x340 [gfs2]
  gfs2_ail_empty_gl+0x1ab/0x1c0 [gfs2]
  ? inode_go_dump+0xe0/0xe0 [gfs2]
  ? inode_go_sync+0xe4/0x220 [gfs2]
  inode_go_sync+0xe4/0x220 [gfs2]
  do_xmote+0x12b/0x290 [gfs2]
  glock_work_func+0x6f/0x160 [gfs2]
  process_one_work+0x461/0x790
  worker_thread+0x69/0x6b0
  ? process_one_work+0x790/0x790
  kthread+0x1ae/0x1d0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x22/0x40

thanks for tracking this down, very interesting.

The consistency model here is that every buffer head that a struct
gfs2_bufdata object is attached to is protected by a glock. Before a
glock can be released, all the buffers under that glock have to be
flushed out and released; this is what allows another node to access
the same on-disk location without causing inconsistencies. When there
is a bufdata object that points to a glock that has already been
freed, this consistency model is broken. Taking an additional refcount
as this patch does may make the use-after-free go away, but it doesn't
fix the underlying problem. So I think we'll need a different fix
here.

Did you observe this problem in a real-world scenario, or with KASAN
only? It might be that we're looking at a small race that is unlikely
to trigger in the field. In any case, I think we need to understand
better what't actually going on.


I finally got time to look into this further. The following is what happens as far as I can tell though my understanding is a bit limited at this point:

1. A file is opened, truncated and closed which results in a metadata write so a gfs2_bufdata object is created and associated with the inode glock.

2. The gfs2_bufdata is written out and placed on an AIL list.

3. The VFS layer calls gfs2_evict_inode() and the inode is evicted which drops a reference on the glock. It takes case 3 (i_nlink > 0) so no log flush or ail_flush happens.

4. The gfs2_bufdata object is however still on one of the AIL lists and the next gfs2_log_flush() begins to write a revoke entry.

5. At about the same time the glock is freed. At the point of freeing, gl_revokes is 1 (should probably have a GLOCK_BUG_ON for this).

6. gfs2_log_flush() continues and calls revoke_lo_after_commit() and uses the freed glock (stack trace above).

Should evict_inode call gfs2_ail_flush() or something so that the revoke is written before it drops its reference to the glock?

Or is there something else that is meant to keep the glock around if the inode is evicted while there is a linked gfs2_bufdata sitting on some AIL list?

Let me know if any more specific information is needed since I now have a test setup that can usually reproduce this within 10 minutes.

Thanks,
--
Ross Lagerwall

Reply via email to