Post your document on the reiserfs mailing list when you finish it, the ReiserFS team 
will enjoy
reading it.

Hans

Daniel Phillips wrote:
> 
> Alexander Viro wrote:
> > On Wed, 26 Jul 2000, Stephen C. Tweedie wrote:
> > > On Wed, Jul 26, 2000 at 03:19:46PM -0400, Alexander Viro wrote:
> > >
> > > > Erm? Consider that: huge lseek() + write past the end of file. Woops - got
> > > > to unmerge the tail (it's an internal block now) and we've got no
> > > > knowledge of IO going on the page. Again, IO may be asynchronous - no
> > > > protection from i_sem for us. After that page becomes a regular one,
> > > > right? Looks like a change of state to me...
> > >
> > > Naturally, and that change of state must be made atomically by the
> > > filesystem.
> >
> > Yep. Which is the point - there _are_ dragons. I believe that it's doable,
> > but I realy want to repeat: Daniel, watch out for races at the moments
> > when page state changes, it needs more accurate approach than usual
> > pagecache-using fs. It can be done, but it will take some reading (and
> > yes, Stephen, I know that _you_ know it ;-)
> 
> That's apparent, and I feel that Stephen could probably implement the entire
> tail merge as described so far in few days.  But that wouldn't be as useful as
> having me and perhaps some interested observers others go all the way through
> the exercise of figuring out the so-far unwritten rules of the
> buffercache/pagecache duo.
> 
> The exact same accurate work is required for Tux2, which makes massive use of
> copy-on-write.  Right now, buffer issues are the main thing standing in the way
> of making a development code release for Tux2.  So there is no question in my
> mind about whether such issues have to be dealt with: they do.
> 
> I dove into the 2.4.0 cache code for the first time last night (using lxr - try
> it, you'll like it) and I'm almost at the point where I have some relevant
> questions to ask.  I notice that buffer.c has increased in size by almost 50%
> and is far and away the largest module in the VFS.  Worse, buffer.c is massively
> cross-coupled to the mm subsystem and the page cache, as we know too well.
> Buffer.c is right at the core of the issues we're talking about.
> 
> Bearing that in mind, instead of just jumping in and starting to code I'll try
> the methodical approach :-)  My immediate objective is to try clarify a few
> things that aren't immediately obvious from the source, in the following areas:
> 
>   - States and transitions for the main objects:
>     - Buffer heads
>     - Buffer data
>     - Page heads
>     - Page data
>     - Other?
> 
>   - Existing concurrency controls:
>     - Semaphores/Spinlocks
>     - Big kernel lock
>     - Filesystem locks
>     - Posix locks?
>     - Other?
> 
>   - Planned additions/deletions of concurrency controls
> 
> I will also try to make a list of the main internal functions in the VFS (and
> some related ones from the mm and drivers modules) and examine
> function-by-function what the intended usage is, what the issues/caveats are,
> and maybe even how we can expect them to evolve in the future.
> 
> I think we need even more than this in terms of documentation in order to work
> effectively, but this at least will be a good start.  It will be more than what
> we have now.  If it gets to the point where we can actually answer questions
> about race conditions by consulting the docs then we really will have
> accomplished something.  Yes, I know that the code is going to keep evolving and
> sometimes will break the docs, but I also have confidence that the docs can keep
> up with such evolution given some interested volunteer doc maintainers willing
> to hang out on the devel list and keep asking questions.
> 
> Even in 2.2.x I felt that there is a lot of understated elegance in Linux's
> buffer cache design.  In 2.4.0 it seems to be getting more elegant, although
> it's hard to say exactly, because of the sparse (read: nonexistent)
> documentation.  This is a problem that can be easily fixed.
> 
> To get through this I will have to ask a lot of naive-sounding questions.
> Hopefully I'll have the first batch ready this afternoon (morning, your time).
> 
> --
> Daniel

Reply via email to