On Fri, Jul 20, 2007 at 01:53:16AM +0100, Al Viro wrote:
> On Fri, Jul 20, 2007 at 10:45:34AM +1000, David Chinner wrote:
> > On Thu, Jul 19, 2007 at 06:16:00PM -0400, Jan Harkes wrote:
> > > On Thu, Jul 19, 2007 at 11:45:08PM +0200, Christoph Hellwig wrote:
> > > > ->release is the proper way to detect the last close of a file,
> > > > file_count should never be used in filesystems.
> > > 
> > > Has been tried, the problem with that once ->release is called it is too
> > > late to pass the the error back to close(2).
> > 
> > I think you'll find the problem is that fput() throws away the error
> > from ->release, not that it's too late....
>
> Just where would that return value go?

Right, there are actually quite a few places where different drivers and
file systems are returning non-zero errors from ->release. I just built
a kernel with the prototype changed to void to see how many places are
affected,  118 files changed, 211 insertions(+), 391 deletions(-)

That's with my default config on i386, so there are probably quite a few
more. The change also break the ABI quite badly (several EXPORT_SYMBOL
functions are affected) There were a few places where the error handling
path seems to lead to a possible struct file or private_data memory leak.

> BTW, the reason why checks for struct file refcount blow is not far
> from that:
> 
> task A: write()
> task B (sharing descriptor table with A): close()
> task C (with another reference to struct file in question): close()
> task A: return from write()
> 
> Now, the final fput() here happens in write().  In particular, no
> call of close(2) sees refcount equal to 1.

I see.

So if I want last-writer semantics, to let the last close trigger
writeback (upcall to userspace) combined with the ability to return an
error from the writeback back to close(2). To return an error I have to
use fops->flush, but as you've shown, I cannot reliably test if this
close dropped the last reference. In fact in your example there was
write activity even after the last close.

I guess I should find a way to delay the close (or at least the
userspace notification/return from syscall) until there are no active
writes.

Jan


=====
Some of the possibly suspect error handling cases I found while building
a kernel with void release() in fops, I haven't really looked at these
too deeply (i.e. only looked at the deallocation, not the allocations so
these may very well be harmless),

drivers/char/ipmi/ipmi_devintf.c: ipmi_release
    returns error when ipmi_destroy_user fails, does not call ipmi_fasync
    and seems to leak memory referenced by file->private_data

drivers/scsi/sg.c: sg_release
    may return ENXIO. leaks file->private_data when sfp->parentdp is NULL.

fs/autofs4/root.c: autofs4_dir_close
    returns EBUSY or ENOENT. In the case it returns busy it isn't
    releasing a struct file.

fs/bad_inode.c: bad_file_release
    returns EIO. Since the return code is ignored either way, there is
    no need to even implement this function.

fs/dlm/user.c: device_close
    may return ENOENT, doesn't free memory referenced by
    file->private_data if it fails this way.

fs/ecryptfs/file.c: ecryptfs_release
    returns error when it fails to close the lower file, it also seems
    to leak memory in this case.


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to