"Stephen C. Tweedie" wrote:
> Hi,
>
> On Tue, 12 Oct 1999 03:14:03 +0400, Hans Reiser <[EMAIL PROTECTED]> said:
>
> >> Hans, you didn't mention a journal call that happens on sync, or
> >> sync_old_buffers...
>
> > I see two issues: how to respond to memory pressure, and how to sync.
> > I'll let you articulate our sync needs.
>
> There are actually two separate memory pressure concerns.
> The first is
> how to clear out some dirty, pinned buffers when we need to free up some
> memory, and try_to_free_buffers/bdflush are the main mechanisms involved
> right now.
>
> With journaling, however, we have a new problem. We can have large
> amounts of dirty data pinned in memory, but we cannot actualy write
> that data to disk without first allocating more memory.
Trivia: I don't think this is a feature of journaling, but rather a feature of a
particular implementation of journaling. Chris will correct me if I err, but
Chris's journaling doesn't have this property.
Let us define a buffer's state as FLUSHTIME_NON_EXPANDING if flushing it
requires no additional memory, and FLUSHTIME_EXPANDING otherwise.
I see the following separate issues:
how to drive a kernel subsystem to flush some memory. I advocate that the vm
system push, and the subsystems give it calls for doing the pushing.
How to ensure that there is at least largest_reservation buffers of
FLUSHTIME_NON_EXPANDING memory at all times, where largest_reservation is the
sum of the amount every kernel subsystem says it might need at maximum. There
would be a reserve() and unreserve() for the kernel subsystems to call.
I hypothesize that if largest_reservation is unnecessarily large, so long as it
is not completely obscene performance will not suffer (and might gain), and the
code simplicity/performance will be improved as a result of using the maximum
possible to need rather than tracking the amount actually needed.
the interface for syncing commits.
> I'm not sure
> about the reiserfs case, but in ext3 I certainly need to allocate
> buffers to describe control blocks in the journal, for example.
>
> This introduces a second memory pressure requirement: we must always
> restrict the amount of unrecoverable dirty pinned memory so that when we
> want to reclaim that memory, we have enough unpinned pages left to
> complete the commit operation.
>
> This came up in discussions with the XFS people [hence the linux-fsdevel
> cross post]: it matters to many filesystems. In XFS it is a
> substantially more significant problem, because they are performing
> delayed allocation of written data and so they potentially need a lot
> more space in core for metadata updates before the data can be flushed
> to disk.
>
> This is much less of a problem for ext3 and will also probably not
> matter too much for reiserfs until you decide to move to lazy block
> allocation.
We will indeed move to flushtime block allocation.
> However, a common mechanism for dealing with this would
> definitely let all three filesystems survive just that bit better under
> really serious memory pressure.
>
> --Stephen
For reiserfs, it would simplify our balancing code (fix_nodes() in particular)
and improve our performance if we could efficiently reserve. Roma, think about
this.
Hans
--
Get Linux (http://www.kernel.org) plus ReiserFS
(http://devlinux.org/namesys). If you sell an OS or
internet appliance, buy a port of ReiserFS! If you
need customizations and industrial grade support, we sell them.