On Fri, Jul 20, 2007 at 01:53:16AM +0100, Al Viro wrote:
> On Fri, Jul 20, 2007 at 10:45:34AM +1000, David Chinner wrote:
> > On Thu, Jul 19, 2007 at 06:16:00PM -0400, Jan Harkes wrote:
> > > On Thu, Jul 19, 2007 at 11:45:08PM +0200, Christoph Hellwig wrote:
> > > > ->release is the proper way to detect the last close of a file,
> > > > file_count should never be used in filesystems.
> > >
> > > Has been tried, the problem with that once ->release is called it is too
> > > late to pass the the error back to close(2).
> >
> > I think you'll find the problem is that fput() throws away the error
> > from ->release, not that it's too late....
>
> Just where would that return value go?
Right, there are actually quite a few places where different drivers and
file systems are returning non-zero errors from ->release. I just built
a kernel with the prototype changed to void to see how many places are
affected, 118 files changed, 211 insertions(+), 391 deletions(-)
That's with my default config on i386, so there are probably quite a few
more. The change also break the ABI quite badly (several EXPORT_SYMBOL
functions are affected) There were a few places where the error handling
path seems to lead to a possible struct file or private_data memory leak.
> BTW, the reason why checks for struct file refcount blow is not far
> from that:
>
> task A: write()
> task B (sharing descriptor table with A): close()
> task C (with another reference to struct file in question): close()
> task A: return from write()
>
> Now, the final fput() here happens in write(). In particular, no
> call of close(2) sees refcount equal to 1.
I see.
So if I want last-writer semantics, to let the last close trigger
writeback (upcall to userspace) combined with the ability to return an
error from the writeback back to close(2). To return an error I have to
use fops->flush, but as you've shown, I cannot reliably test if this
close dropped the last reference. In fact in your example there was
write activity even after the last close.
I guess I should find a way to delay the close (or at least the
userspace notification/return from syscall) until there are no active
writes.
Jan
=====
Some of the possibly suspect error handling cases I found while building
a kernel with void release() in fops, I haven't really looked at these
too deeply (i.e. only looked at the deallocation, not the allocations so
these may very well be harmless),
drivers/char/ipmi/ipmi_devintf.c: ipmi_release
returns error when ipmi_destroy_user fails, does not call ipmi_fasync
and seems to leak memory referenced by file->private_data
drivers/scsi/sg.c: sg_release
may return ENXIO. leaks file->private_data when sfp->parentdp is NULL.
fs/autofs4/root.c: autofs4_dir_close
returns EBUSY or ENOENT. In the case it returns busy it isn't
releasing a struct file.
fs/bad_inode.c: bad_file_release
returns EIO. Since the return code is ignored either way, there is
no need to even implement this function.
fs/dlm/user.c: device_close
may return ENOENT, doesn't free memory referenced by
file->private_data if it fails this way.
fs/ecryptfs/file.c: ecryptfs_release
returns error when it fails to close the lower file, it also seems
to leak memory in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html