Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped

Alyssa M via 9fans Sun, 15 Feb 2026 18:45:33 -0800

I think the difficulty here is thinking about this as memory mapping. What I'm 
really doing is deferred I/O. By the time a read completes, the read has 
logically happened, it's just that not all of the data has been transferred yet.
That happens later as the buffer is examined, and if pages of the buffer are 
not examined, it doesn't happen in those pages at all.

My implementation (on my hobby OS) only does this in a custom segment type. A 
segment of this type can be of any size, but is not pre-allocated pages in 
memory or the swap file - I do this to allow it to be very large, and because a 
read has to happen within the boundaries of a segment. I back it with a file 
system temporary file, so when pages migrate to the swap area the disk 
allocation can be sparse. You can load or store bytes anywhere in this segment. 
Touching pages allocates them, first in memory and eventually in the swap file 
as they get paged out.

On Saturday, February 14, 2026, at 2:27 PM, Dan Cross wrote:
> but
read/write work in terms of byte buffers that have no obligation to be
byte aligned. Put another way, read and write relate the contents of a
"file" with an arbitrarily sized and aligned byte-buffer in memory,
but there is no obligation that those byte buffers have the properties
required to be a "page" in the virtual memory sense.
Understood. My current implementation does conventional I/O with any fragments 
of pages at the beginning and end of the read/write buffers. So small reads and 
writes happen traditionally. At the moment that's done before the read 
completes, so your example of doing lots of adjacent reads of small areas would 
work very badly (few pages would get the deferred loading), but I think I can 
do better by deferring the fragment I/O, so adjacent reads can coalesce the 
snapshots. My main scenario of interest though is for very large reads and 
writes, because that's where the sparse access has value.

Because reads are copies and not memory mapping, it doesn't matter if the reads 
are not page-aligned. The process's memory pages are not being shared with the 
cache of the file (snapshot), so if the data is not aligned then page faults 
will copy bytes from two cached file blocks (assuming they're the same size). 
In practice I'm expecting that large reads will be into large allocations, 
which will be aligned, so there's an opportunity to steal blocks from the file 
cache. But I'm not expecting to implement this. There's no coherence problem 
here because the snapshot is private to the process. And readonly.

When I do a read call into the segment, firstly a snapshot is made of the data 
to be read. This is functionally equivalent to making a temporary file and 
copying the data into it. Making this copy-on-write so the snapshot costs 
nothing is a key part of this without which there would be no point.
The pages of the read buffer in the segment are then associated with parts of 
the snapshot - rather than the swap file. So rather than zero filling (or 
reloading paged-out data) when a load instruction is executed, the memory pages 
are filled from the snapshot.
When a store instruction happens, the page becomes dirty, and loses its 
association with the snapshot. It's then backed by the swap file. If you alter 
all pages of the buffer, then all pages are disconnected from the snapshot, and 
the snapshot is deleted. At that point you can't tell that anything 
unconventional happened.
If I 'read over' a buffer with something else, the pages get associated with 
the new snapshot, and disassociated from the old one.

When I do a write call, the write call looks at each page, and decides whether 
it is part of a snapshot. If it is, and we're writing back to the same part of 
the same file (an update) and the corresponding block has not been changed in 
the file, then the write call can skip that page. In other cases it actually 
writes to the file. Any other writing to the file that we made a snapshot from 
invokes the copy-on-write mechanism, so the file changes, but the snapshot 
doesn't.

If you freed the read buffer memory, then parts of it might get demand loaded 
in the act of writing malloc's book-keeping information into it - depending on 
how the malloc works. If you later use calloc (or memset), it will zero the 
memory, which will detach it all from the snapshot, albeit loading every page 
from the snapshot as it goes...
One could change calloc to read from /dev/zero for allocations over a certain 
size, and special-case that to set up pages for zero-fill when it happens in 
this type of segment, which would disassociate the pages from the old snapshot 
without loading them, just as any other subsequent read does. A memset syscall 
might be better. 
Practically, though, I think malloc and free are not likely to be used in this 
type of segment. You'd probably just detach the segment rather than free parts 
of it, but I've illustrated how you could drop the deferred snapshot if you 
needed to.

So this is not mmap by another name. It's an optimization of the standard 
read/write approach that has some of the desirable characteristics of mmap. In 
particular: it lets you do an arbitrarily large read call instantly, and fault 
in just the pages you actually need as you need them. So like demand-paging, 
but from a snapshot of a file. Similarly, if you're writing back to the same 
file region, write will only write the pages that have altered - either in 
memory or in the file. This is effectively an update, somewhat like msync. 

It's different from mmap in some ways: the data read is always a copy of the 
file contents, so there's never any spooky changing of memory under your feet. 
The behaviour is not detectably different to the program from the traditional 
implementation - except for where and if the time is spent.

There's still more I could add, but if I'm still not making sense, perhaps I'd 
better stop there. I think I've ended up making it sound more complicated than 
it is. 

On Sunday, February 15, 2026, at 10:19 AM, hiro wrote:
> since you give no reasons yourself, let me try to hallucinate a reason
why you might be doing what you're doing here:

Here was my example for you:

On Thursday, February 12, 2026, at 1:34 PM, Alyssa M wrote:
> I've built a couple of simple disk file systems. I thinking of taking the 
> cache code out of one of them and mapping the whole file system image into 
> the address space - to see how much it simplifies the code. I'm not expecting 
> it will be faster.

This is interesting because it's a large data structure that's very sparsely 
read or written. I'd read the entire file system image into the segment in one 
gulp, respond to some file protocol requests (e.g. over 9P) by treating the 
segment as a single data structure, and write the entire image out periodically 
to implement what we used to call 'sync'.
With traditional I/O that would be ridiculous. With the above mechanism it 
should work about as well as mmap would. And without all that cache code and 
block fetching. Which is the point of this.
------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Te8d7c6e48b5c075b-M2de7c0bb4f1ac35c8f5e12e2
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped

Reply via email to