> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Hans Reiser
>
> "Stephen C. Tweedie" wrote:
>
> > Hi,
> >
> > On Tue, 12 Oct 1999 03:14:03 +0400, Hans Reiser <[EMAIL PROTECTED]> said:
> >
> > >> Hans, you didn't mention a journal call that happens on sync, or
> > >> sync_old_buffers...
> >
> > > I see two issues: how to respond to memory pressure, and how to sync.
> > > I'll let you articulate our sync needs.
> >
> > There are actually two separate memory pressure concerns.
>
> > The first is
> > how to clear out some dirty, pinned buffers when we need to free up some
> > memory, and try_to_free_buffers/bdflush are the main mechanisms involved
> > right now.
> >
> > With journaling, however, we have a new problem. We can have large
> > amounts of dirty data pinned in memory, but we cannot actualy write
> > that data to disk without first allocating more memory.
>
> Trivia: I don't think this is a feature of journaling, but rather
> a feature of a
> particular implementation of journaling. Chris will correct me
> if I err, but
> Chris's journaling doesn't have this property.
>
It didn't before async commits, but it does now. I might be able to write a
low memory function to end the transaction synchronously that doesn't have
this problem (allocate X buffer_heads at transaction start, on transaction
end, write to the log in chunks of size X, reusing buffers as you go).
Right now, the code needs to find buffer_heads for all the log blocks, and
flushing all the log blocks will allocate some memory along the way (but
that one is easy to fix).
> Let us define a buffer's state as FLUSHTIME_NON_EXPANDING if flushing it
> requires no additional memory, and FLUSHTIME_EXPANDING otherwise.
>
> I see the following separate issues:
>
> how to drive a kernel subsystem to flush some memory. I advocate
> that the vm
> system push, and the subsystems give it calls for doing the pushing.
>
> How to ensure that there is at least largest_reservation buffers of
> FLUSHTIME_NON_EXPANDING memory at all times, where
> largest_reservation is the
> sum of the amount every kernel subsystem says it might need at
> maximum. There
> would be a reserve() and unreserve() for the kernel subsystems to call.
> I hypothesize that if largest_reservation is unnecessarily large,
> so long as it
> is not completely obscene performance will not suffer (and might
> gain), and the
> code simplicity/performance will be improved as a result of using
> the maximum
> possible to need rather than tracking the amount actually needed.
>
> the interface for syncing commits.
>
>
>
> > I'm not sure
> > about the reiserfs case, but in ext3 I certainly need to allocate
> > buffers to describe control blocks in the journal, for example.
> >
Exactly.
> > This introduces a second memory pressure requirement: we must always
> > restrict the amount of unrecoverable dirty pinned memory so that when we
> > want to reclaim that memory, we have enough unpinned pages left to
> > complete the commit operation.
> >
> > This came up in discussions with the XFS people [hence the linux-fsdevel
> > cross post]: it matters to many filesystems. In XFS it is a
> > substantially more significant problem, because they are performing
> > delayed allocation of written data and so they potentially need a lot
> > more space in core for metadata updates before the data can be flushed
> > to disk.
> >
> > This is much less of a problem for ext3 and will also probably not
> > matter too much for reiserfs until you decide to move to lazy block
> > allocation.
>
> We will indeed move to flushtime block allocation.
>
> > However, a common mechanism for dealing with this would
> > definitely let all three filesystems survive just that bit better under
> > really serious memory pressure.
> >
> > --Stephen
>
> For reiserfs, it would simplify our balancing code (fix_nodes()
> in particular)
> and improve our performance if we could efficiently reserve.
> Roma, think about
> this.
>
> Hans
>
-chris