:
:In message <[EMAIL PROTECTED]>, Matthew Dillon writes:
:
:> Well, let me tell you what the fuzzy goal is first and then maybe we
:> can work backwards.
:>
:> Eventually all physical I/O needs a physical address. The quickest
:> way to get to a physical address is to be given an array of vm_page_t's
:> (which can be trivially translated to physical addresses).
:
:Not all:
:PIO access to ATA needs virtual access.
:RAID5 needs virtual access to calculate parity.
... which means that the initial implementation for PIO and RAID5
utilizes the mapped-buffer bioops interface rather then the b_pages[]
bioops interface.
But here's the point: We need to require that all entries *INTO* the
bio system start with at least b_pages[] and then generate b_data only
when necessary. If a particular device needs a b_data mapping, it
can get one, but I think it would be a huge mistake to allow entry into
the device subsystem to utilize *either* a b_data mapping *or* a
b_pages[] mapping. Big mistake. There has to be a lowest common
denominator that the entire system can count on and it pretty much has
to be an array of vm_page_t's.
If a particular subsystem needs b_data, then that subsystem is obviously
willing to take the virtual mapping / unmapping hit. If you look at
Greg's current code this is, in fact, what is occuring.... the critical
path through the buffer cache in a heavily loaded system tends to require
a KVA mapping *AND* a KVA unmapping on every buffer access (just that the
unmappings tend to be for unrelated buffers). The reason this occurs
is because even with the larger amount of KVA we made available to the
buffer cache in 4.x, there still isn't enough to leave mappings intact
for long periods of time. A 'systat -vm 1' will show you precisely
what I mean (also sysctl -a | fgrep bufspace).
So we will at least not be any worse off then we are now, and probably
better off since many of the buffers in the new system will not have
to be mapped. For example, when vinum's RAID5 breaks up a request
and issues a driveio() it passes a buffer which is assigned to b_data
which must be translated (through page table lookups) to physical
addresses anyway, so the fact that that vinum does not populate
b_pages[] does *NOT* help it in the least. It actually makes the job
harder.
-Matt
Matthew Dillon
<[EMAIL PROTECTED]>
:--
:Poul-Henning Kamp FreeBSD coreteam member
:[EMAIL PROTECTED] "Real hackers run -current on their laptop."
:FreeBSD -- It will take a long time before progress goes too far!
:
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message