----- Original Message -----
> Before this patch set, several gfs2 functions failed to reserve enough
> revoke entries in the journal. Some examples:
> 
> 1. gfs2_dinode_dealloc failed to reserve a revoke for the dinode
>    being deleted.
> 2. Any function that allocates dinodes with gfs2_alloc_blocks
>    should reserve a revoke because alloc_blocks will premptively
>    call trans_remove_revoke to make sure there isn't a pending revoke
>    for the new dinode.
> 3. Any function that potentially will unstuff a stuffed directory
>    needs to reserve a revoke because gfs2_unstuff_dinode calls
>    gfs2_trans_remove_revoke for the new journaled leaf block.
> 
> In addition, function gfs2_trans_remove_revoke unconditionally
> decrements tr->tr_num_revoke, and if not enough revokes are reserved, the
> value goes from 0 to  4294967295 (-1, but it's an unsigned int). This is
> later
> re-added to the system-wide revoke numbers, thereby decrementing the value
> (sd_log_commited_revoke) "properly," but by accident. This worked properly
> most of the time because one transaction would reserve space for revokes,
> then it would be merged with the system transaction (sdp->sd_log_tr) and it
> usually did not run out, because you can hold a lot of revoke entries
> per log descriptor block. Some of the code, such as gfs2_write_revokes, would
> work around this and somehow got it right most of the time. However, some
> jdata tests with xfstests generic/269 encountered problems when it actually
> ran out.
> 
> This series adds needed revoke entries to the transactions that
> need them. So now we try to do proper accounting of revokes.
> 
> Bob Peterson (6):
>   gfs2: revoke cleanup: leaf_dealloc
>   gfs2: revoke cleanup: alloc_dinode and gfs2_create_inode
>   gfs2: revoke cleanup: gfs2_dinode_dealloc
>   gfs2: revoke cleanup: gfs2_iomap_begin_write
>   gfs2: revoke cleanup: truncate functions
>   gfs2: revoke cleanup: gfs2_trans_remove_revoke
> 
>  fs/gfs2/bmap.c  | 25 +++++++++++++------------
>  fs/gfs2/dir.c   |  3 ++-
>  fs/gfs2/inode.c |  5 +++--
>  fs/gfs2/super.c |  2 +-
>  fs/gfs2/trans.c | 16 ++++++++++++++--
>  5 files changed, 33 insertions(+), 18 deletions(-)
Hi,

Self-NACK on this patch series. It's not ready for prime time yet.

While it's still true that many transactions don't reserve enough blocks for
revokes, the last patch to gfs2_trans_remove_revoke won't work properly.

It checks for revokes being removed and the counters "improperly" going 
negative,
but there's a disconnect: Multiple processes can add the same metadata block
over and over to the journal, and they get merged with the superblock 
transaction.
For example, when multiple processes allocate and/or delete blocks and the
bitmap block gets rewritten over and over again for multiple transactions.

This means after some churn, a process might (properly) add a metadata block
but in so doing, remove multiple revokes for that same block, which causes
the numbers to temporarily go negative. And this is not due to a lack of
revokes being reserved. This is because the revokes go on a central list
from the superblock, but may originate from multiple transactions.
For example, a transaction may revise a bitmap, but in so doing, it may
need to remove several revokes that were added by other processes before it,
and thus, its count of transaction revokes goes negative.

By the way, the value can go negative because patch e955537e326 changed the
behavior from always using positive numbers. (Which is not necessarily a bad
thing).

One solution to this would be to search the global revokes list for all
revoke instances of when new revokes are added, and remove them so that
there's only a single instance on the queue at any given point in time.

If we did this, I'd be worried about improper behavior when journal replays
are needed when transactions are interrupted by a node getting blasted or
fenced, etc. I'd also be worried about a performance impact if every revoke
needs to search a global linked list, while holding the list's ail list
spin_lock.

As I see it, we have several choices:

1. We can revert patch e955537e326 and only allow positive numbers.
   There isn't much gain in doing this.
2. We can embrace the negative values and just fix the problem revoke
   transactions. This might be as simple as eliminating the last patch,
   "gfs2: revoke cleanup: gfs2_trans_remove_revoke".
   If it makes sense, we can also change the counter from unsigned int
   to a signed int. Or maybe we don't even care.
3. We can work out a new solution to keep track of revokes and revoke counts.

For several years we've talked about ways to improve our journal flushing
code and to possibly allow transactions to be queued during log flushes.
So whatever solution we come up with needs to keep that in mind.

Regards,

Bob Peterson
Red Hat File Systems

Reply via email to