On Thu, Nov 22, 2018 at 02:44:35PM +0100, Johannes Thumshirn wrote:
> On 22/11/2018 14:35, David Sterba wrote:
> > On Thu, Nov 22, 2018 at 02:15:28PM +0100, Johannes Thumshirn wrote:
> >> err holds the return value of either btrfs_del_root_ref() or
> >> btrfs_del_inode_ref() but it hasn't been checked since it's
> >> introduction with commit fe66a05a0679 (Btrfs: improve error handling
> >> for btrfs_insert_dir_item callers) in 2012.
> >>
> >> The first attempt in removing the variable was rejected, so instead of
> >> removing 'err', start handling the errors of
> >> btrfs_del_root_ref()/btrfs_del_inode_ref() by aborting the transaction in
> >> btrfs_add_link() directly instead of the call-sites.
> > 
> > ... which is an anti-pattern.
> > 
> > https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_handling_and_transaction_abort
> > 
> > In case there are several conditons that can fail we want to see the
> > first one for later analysis and debugging.
> 
> OK, but then handling the error of either btrfs_del_root_ref() or
> btrfs_del_inode_ref() will essentially only be printing an error message
> as anything else would hide the error condition which has led us into
> this path (i.e. btrfs_insert_dir_item() returning EEXISTS or EOVERFLOW),
> wouldn't it?

So this seems to point out there are more things to resolve here:

* how to recover from errors of btrfs_del_root_ref or btrfs_del_inode_ref
* what to report back to the user

When insert_dir_item fails, there are the inode references to be cleaned
up, and error reported as EEXIST or EOVERFLOW.

When btrfs_del_root_ref or btrfs_del_inode_ref fail, the cleanup is not
possible and there's no other option than the transaction abort. As the
original reason was the overflow, this should be reported to back to the
user.

The reason of the del_*_ref failures can be misleading, in case it's eg.
ENOMEM. This would mean that the add-link operation is possible but
there was not enough memory and restarting later would let it possibly
succeed.

So the missing part of error hanling is to add two "if (err) abort" and
then still return 'ret'. As this is not all obvious why, a comment would
be good there.

Reply via email to