On 2/26/21 1:30 PM, David Sterba wrote:
On Wed, Dec 16, 2020 at 11:26:19AM -0500, Josef Bacik wrote:
Currently select_reloc_root() doesn't return an error, but followup
patches will make it possible for it to return an error.  We do have
proper error recovery in do_relocation however, so handle the
possibility of select_reloc_root() having an error properly instead of
BUG_ON(!root).  I've also adjusted select_reloc_root() to return
ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the
error case easier to deal with.  I've replaced the BUG_ON(!root) with an
ASSERT(0) for this case as it indicates we messed up the backref walking
code, but it could also indicate corruption.

Signed-off-by: Josef Bacik <jo...@toxicpanda.com>
---
  fs/btrfs/relocation.c | 15 ++++++++++++---
  1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 08ffaa93b78f..741068580455 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2024,8 +2024,14 @@ struct btrfs_root *select_reloc_root(struct 
btrfs_trans_handle *trans,
                if (!next || next->level <= node->level)
                        break;
        }
-       if (!root)
-               return NULL;
+       if (!root) {
+               /*
+                * This can happen if there's fs corruption or if there's a bug
+                * in the backref lookup code.
+                */
+               ASSERT(0);

You've added these assert(0) to several patches and I think it's wrong.
The asserts are supposed to verify invariants, you can hardly say that
we're expecting 0 to be true, so the construct serves as "please crash
here", which is no better than BUG().  It's been spreading, there are
like 25 now.

They are much better than a BUG_ON(), as they won't trip in production, they'll only trip for developers. And we need them in many of these cases, and this example you've called out is a clear example of where we absolutely want to differentiate, because we have 3 different failure modes that will return -EUCLEAN. If I add a ASSERT(ret != -EUCLEAN) to all of the callers then when the developer hits a logic bug they'll have to go and manually add in their own assert to figure out which error condition failed. Instead I've added explanations in the comments for each assert and then have an assert for every error condition so that it's clear where things went wrong.

So these ASSERT()'s are staying. I could be convinced to change them to match their error condition, so in the above case instead I would be fine changing it to

if (!root) {
        ASSERT(root);
...

If that's more acceptable let me know, I'll change them to match their error condition. Thanks,

Josef

Reply via email to