Oops, some of you saw Part I with the Part II header; no matter. Here's Part II:
We need to make it so that dd does not eat up all the memory when it fills the hosage object faster than the filesystem can page it out. Two things are happening here to produce suboptimal behavior, both on systems with enough memory that no paging even happens in this case, and on systems with not enough. *** First problem: As memory gets tight, eventually the kernel forces dd to stop allocating pages. But it does so way too late. First off, we can afford to increase the free memory pool a lot more than we do now. The 15 page threshhold at which dd gets clamped was set in 1985 (or maybe earlier), and our machines have rather more memory than then. But even then, the real problem is that dd and the filesystem both get clamped *at the same time*. What we need is for dd to get clamped *before* the filesystem, so that dd's page allocations block, and the filesystem's continue. In fact, that's what happens with the default pager: below 15 pages, the users of the default pager can't allocate pages, but it still can. That is set up through the special thread_wire kernel call. Now, we cannot thread_wire the filesystem's threads, attractive though that might be. But there is a solution that I think is reasonably good. When an external (==non-default) pager requires memory in order to execute a pageout, I propose the assumption that it is either allocating anonymous (==default-paged) memory in reasonably small amounts in order to write the pages, or it is storing the pages themselves in some *other* externally paged object. (The case that I am assuming is sufficiently rare as to be ignored, is the case where the pager allocates memory in external objects, in order to page the pages to somewhere outside the paging system.) Given that assumption, I propose that the following solutions will help alleviate the First Problem. These solutions are not mutually exclusive. I would like to see each get implemented. Solution 1.1: Add a new paging threshhold (above the current 15 pages) below which page allocations for external objects block, but allocations for internal objects succeed. This will keep dd or ld from continuing to allocate pages, but allow the filesystem to make forward progress. Solution 1.2: We should limit the amount of memory the filesystem consumes in processing pageout requests. The kernel already contains mechanisms for limiting how much paging goes to the filesystem at once, with a dynamically tuned bursting control, which helps a lot. But it could be improved by trying to gather adjacent pages for pageout and issuing multi-page pageout requests. (Right now this is never done and libpager knows it is never done.) This is a fair amount of work to implement. Solution 1.3: We could clamp the amount of memory the filesystem spends on paging through various mechanisms. I would like to avoid having to do this; the existing burstiness controls manage it OK. But those threads do consume resources, and when they are idle, the resources are being wasted. The fix here is twofold: make the thread timeout code in hurd/libports/manage-multithread.c work, and make cthreads really delete thread stacks and such when threads terminate. Solution 1.4: The existing paging threshhold and timers were set in 1985, and they need to be increased to meet modern machine characteristics more closely. *** Second problem: There is observed filesystem burstiness, and the reason is as follows: The filesystem writes to disk when one of two things happens: either memory is tight, and the pageout thread is working, or something has synced the file. Sync happens every thirty seconds (by default) and so we get slightly pessimal behavior by waiting for all writes to happen at sync-time. (Sync also happens when you close a file, or when various other things trigger it, but the moral of the story is the same.) This produces burstiness. We fix it thus: Solution 2: The kernel already notices sequential access, and when it sees it it takes the "previous" pages and marks them inactive right away. That goes a long way; it means that when memory gets tight, these will be the first pages cleaned (==written to disk). But why wait that long--the pages have to get written eventually *anyway*. So my solution here is to assume that external objects have "important" data, which is long-lived, and so when it's inactive we should spend a little effort continually trying to clean it. My proposal is to create a new paging threshhold as a fraction of the number of inactive externally-managed pages, and have the pageout thread try to keep at least that many pages clean all the time, even if memory is plentiful. I can implement numbers 1.1, 1.4, and 2 right away. What I would like is concrete advice, backed up by reasoning, for how to set or adjust these paging parameters. The existing ones are specified at the front of mach/vm/vm_pageout.c. I have added 1.2 and 1.3 to the task list. Thomas

