Re: i_size: why get complicated?

Daniel Phillips Thu, 15 Mar 2001 06:17:26 -0800
On Tue, 13 Mar 2001, Alexander Viro wrote:
> On Mon, 12 Mar 2001, Daniel Phillips wrote:
> 
> > I'm not at all comfortable with the way we are handling i_size right
> > now - it gets set in a funny way:
> > 
> >   - vmtruncate sets i_size lower
> >   - _commit_write sets it higher
> > 
> > Coding for this feels pretty strange.  I just ran into a bug
> > resulting from the fact that I hadn't called _commit_write so i_size
> > didn't have the value I wanted.  Wouldn't it be a lot simpler to let
> > high level code set i_size itself, and leave the page cache to worry
> > about whether pages are present or not.  Similarly, vmtruncate should
> > concern itself with freeing pages above the i_size mark and not set
> > i_size itself.  In other words, its behaviour should be driven by
> > i_size, it shouldn't be driving i_size, that's not its business.
> 
> Order of purging the cache and setting i_size is pretty sensitive -
> there is a nasty bunch of races prevented by it.
> 
> The thing being: for file expansion we actually want i_size updated
> after we grow relevant metadata. For shrinking we want it to happen
> before the metadata changes. The former is due to mmap() (ultimately
> pagein) behaviour. We don't want async IO on pages that are past the
> place metadata had grown to (doesn't matter on normal filesystems,
> but does on no-holes ones).

It's not clear to me why the ->read/writepage should not simply fail
in this case.

> Moreover, on no-holes filesystems we
> can't be sure that we will be able to grow the file at the full
> requested size.

This is just an error reporting problem, right?

> So setting i_size before is not going to work. Setting it after is
> also race-prone - think of async IO in the area we are killing in
> ->truncate().

That's exactly what I'm thinking about.  The idea is that all
permutations of readpage, writepage, file readpage and file writepage
are handled by the vfs/mm and some of them fail depending on the value
of i_size.  None of them should set i_size.  Only file_write, truncate
and high level filesystem code code should set i_size.

> As for generic_file_write()... i_size doesn't really belong there.
> Reason: as soon as block devices go into pagecache we will have to
> reproduce the whole logics.

We're not talking much logic here - file_write just increments i_size
by the size of the write.

> What for?

To make things nice and simple for kernel programs and clean up what
seems to be an inverted dependency.

> Better just make ->prepare_write()
> and ->commit_write() handle the size limits and leave the upper-layer
> logics common for both cases (we'll obviously need different beginning
> of the function, but the main loop can be shared).
> 
> The point being: i_size doesn't really belong to VM.

I didn't say that.

> We might win
> from having i_maxpage (and letting expanding commit_write() update
> it if it wants). But i_size itself is of concern only to fs code.
> Moreover, rules for updating i_maxpage upon write() do not belong to
> VM - they depend on specific address_space.

Dunno.  I'm narrowly focussed on i_size at the moment.  I found I had
to set it myself in my page cache directory index code, otherwise I'd
need to keep a shadow copy and that qualifies as complexity breeding
complexity.  Maybe I'm also paying for that by having created some sort
of mysterious instability and if that's the case I'll go hunting in the
VFS and try to fix it before giving up and making my code comply by
adding more suckage.

One of the things that bothers me a lot is that _commit_write doesn't
even know what value to set i_size to, to a resolution better than a
page, or at best, a block.  I think we can do better.

You made points up above and they are good points but I didn't see any
'killer point'.

-- 
Daniel

P.S., I'll be on the road for a couple of weeks, please reply to my
yahoo address.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
Re: i_size: why get complicated?

Reply via email to