John Baldwin wrote:
> The swap pager getpages/putpages routines depend on
> swap_pager_swap_init() being called before they are
> called. However, swap_pager_swap_init() isn't called
> until the pagedaemon starts up. Granted, it should
> always be run before init has a chance to exec swapon
> via /etc/rc, however, would it be more correct to
> instead let swap_pager_swap_init() be run by a SI_SUB_VM
> SYSINIT (SI_ORDER_ANY, with the other VM sysinit's bumped
> up to be less than ANY). The race is incredibly small,
> but I'd feel better if it was more correct. Comments?
This heavy a change probably belongs on -arch; I will
tell you what I think, and why, and let consensus sort
I don't think there is a race; the stuff is not really
started until the scheduler is run, and that is the
last thing; before that, it is merely on a run queue.
A serious monkey-wrench in your plan is that the proc
structure for the thing is allocated out of a zone, and
the VM system is not really up at that point, sp doing it
that early is not really an option.
My gut tells me that it should actually _at least_ be
after the SI_SUB_MBUF has started.
You can _probably get away with as early as after the
SI_SUB_VM_CONF has occurred (notice how this happens well
after the SI_SUB_VM).
But earlier is worse.
Basically, the VM system comes up in stages:
o load the loader
o make real mem look like 16M
o load the kernel
o statically allocate some pages and page tables
o build a VM that looks just like real memory
o use a stack trampoline to enter into the
virtualized version of the kernel address space
at the relocation address
--- everything above this line needs a serious cleanup pass ---
o Start running init_main.c (the stuff you are
poking into right now)
o Get the tunables from the loader
o Start the console
o Print the first thing we ever print to say we
are on our way to being alive
o Allocate some more stuff semi-statically by
grabbing physical memory via VALLOC()
o Create some page tables that can be used by
the VM system to refer to the allocations
that have already taken place
o Remap the kernel into a single 4G page, if the
CPU supports it, with the global bit, if the
CPU supports it, to avoid CR3 reload shootdowns
o Start the VM system, which startups up malloc()
--- Now you have a VM system that can't swap, but can
fault to get pages from unallocated physical memory ---
The problem is that the machdep.c code needs to have
executed, as well as the pmap.c code, before you start
trying to do zone allocations, and the VM system needs to
be there and capable of fault handling for grow_kernel()
(page table allocations to grow the KVA that wasn't
preallocated) before you can do malloc's.
This should probably be documented somewhere _AFTER_ it
is cleaned up to get rid of magic incantations, like NKPDE,
KPTDI, MPPTDI, APTPTDI, PTDPTDI, UMAXPTDI, and UMAXPTEOFF;
I could document it all now (I spent two weeks running a
backup tape through my dental fillings over it, and I'm
the original author of the SYSINT() code), but it would
show everyone the skid marks on our underwear.
Maybe Kirk's new book will cover it, or maybe it wants
another book on top of that one.
This is heavy /* You are not expected to understand this.*/
I haven't really looked at whether it tries to do an
allocation immediately, or if it just sits there like
a lump until first reference. If it's not using zinit(),
with the interrupt flag, it's probably not safe that early;
if it's not, please don't change it, since it will eat KVA
space and potentially not use it after the change, when it
didn't before (only if it's not).
The reason I say "don't change this", even if it's not a
problem (e.g. it sits like a lump), is that you will still
end up limiting what's permissable later. Generally, the
order of operations is the way that it is not because it
was "arbitrarily decreed thus", but to permit the most
flexibility for implementation for the people who follow,
and are unlucky enough to not have metal dental fillings.
I suspect that for the IA64 and Alpha support at e.g. 512G of
physical RAM, we are going to want to dynamically allocate
swap_pager_object_list, instead of using a static allocation,
and moving it up too far from where it is would break that
My personal preference is, to quote Buckaroo Banzai, "Hey,
hey... don't pull on that: you never know what those things
are attached to, that far inside the brain" when it comes to
startup ordering, since some of it works because I built an
initial dependency graph before SYSINIT went in, and some of
it works because nothing anyone has done since then and
committed has intermittently broken the dependency graph (if
they broke it, they were lucky and it wasn't intermittent,
so their system became a doorstop until they undid what they
had done, and put it in a safer place).
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message