Alexander Viro wrote:
> On Thu, 10 Aug 2000, Daniel Phillips wrote:
>
> > Correct me if I'm wrong, but I think that if writepage ever needs to unmerge at
> > tail then it's an error. The tail should have already been unmerged by the
> > ext2_file_write, which must necessarily precede the writepage. Similarly for
> > file_mmap: I should unmerge the tail as soon as a file ever memapped, and then
> > a unmerge can't come from that source either.
>
> As for the latter - you shouldn't. THE thing about mmap() is that it is
> not allowed to expand the file. Ever.
Good to see that stated in no uncertain terms. I hadn't read that part of the
code, so I wasn't sure about it. I suppose Posix must have something to say
about it, but I've been unsuccessful so far in getting hold of any useful Posix
docs. (Can somebody give me a pointer??)
It now sounds like file_mmap may not have to be changed at all.
> So all you have is to make sure that your file has no holes _past_ the end of tail.
I don't get it. The tail should have been unmerged by file_write or
ext2_truncate, and nothing else should be able to create such a hole. Did I
miss something?
> ...Then every writepage() may
> * hit the existing full block (no problem)
> * need to allocates the full block (also no problem)
> * write to the tail without changing the length (no unmerging)
Yes, that was the plan.
> It doesn't come for free - to make sure that writepage() will never have
> to unmerge you must unmerge() on expanding ->truncate().
Yes, did I forget to say that? Truncate has to unmerge in every case except
where the truncate happens to hit inside the tail. You could relax that a bit,
and say 'except when truncate hits between the beginning of the tail and the
beginning of the next merged tail' but then you'd either have to keep a map of
all the other tails or you'd have to walk the merge ring. It's much easier to
use the simpler condition; the tiny optimization isn't worth the extra work and
complexity. Probably.
> So there... I had to start doing something like that for UFS pre-patches.
> Hopefully it will be portable to ext2 - I'ld like to try 32Kb blocks/4Kb
> fragments, for one thing. Problems are mostly the same, except that I'm
> trying to deal with allocation units larger than PAGE_CACHE_SIZE - it adds
> some fun. OTOH, I can almost freely use buffer_cache - I'm not interested
> in fragments below the 512 bytes...
It's good to know that mm handling for large filesystem blocks is on the way...
--
Daniel