Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped

Dan Cross Thu, 12 Feb 2026 20:16:53 -0800

On Thu, Feb 12, 2026 at 6:44 AM Alyssa M via 9fans <[email protected]> wrote:
> > On Thursday, February 12, 2026, at 4:53 AM, Dan Cross wrote:
> > > On Wed, Feb 11, 2026 at 11:08 PM Alyssa M via 9fans <[email protected]> 
> > > wrote:
> > >
> > > > On Wednesday, February 11, 2026, at 10:01 AM, hiro wrote:
> > > > what concrete problem are you trying to solve?
> > >
> > > Making software simpler to write, I think.
> >
> > I don't understand that. If the interface doesn't change, how is it simpler?
>
> Think of a program that reads a file completely into memory, pokes at it a
> bit sparsely then writes the whole file out again. This is simple if the file 
> is small.
> If the file gets big, you might start looking around for ways to not do all 
> that I/O,
> and pretty soon you have a buffer cache implementation. So the program is now
> more complex. Not only is there a buffer cache implementation, but you have to
> use it everywhere, rather than just operating on memory.

I don't see how this really works; in particular, the semantics of
read/write are simply different from those of `mmap`.  In the former
case, to read a file into memory, I have to know how big it is (I can
just stat it) and then I have to allocate memory to hold its contents,
and then I expect `read` to copy the contents of the file into the
memory region I just allocated.  Note that there is a "happens before"
relationship between allocating memory and then reading the contents
of the file into that memory.  With mapping a file into virtual
memory, I'm simultaneously allocating address space _and_ arranging
things so that accesses to that region of address space correspond go
parts of the mapped file.

You seem to be proposing a model that somehow pushes enough smarts
into `read` to combine the two, as in the `mmap` case; but how does
that work from a day-to-day programming perspective?  Suppose I go and
allocate a bunch of memory, and then immediately stream a bunch of
data from /dev/random and write it into that memory; the contents of
each page are thus random, and now there's no good way for the VM
system to do anything clever like acknowledge success but not _really_
allocate until I demand fault it in by accessing it (I already did by
scribbling all over it), nor can it do something like say, "oh, these
bits are all zeros; I'll just map this to a single global zero page
and trap stores and CoW", since the contents are random, not uniform.

Now, with these preconditions set, I go to `read` a big file into that
memory: what should the system do?

_An_ argument is that it should just discard the prior contents, since
they are logically overwritten by the contents of the file, anyway.
But that's not general: you aren't guaranteed that the your buffer
you're reading into is properly aligned to do a bunch of page mapping
shenanigans.  Read doesn't care: it just copies bytes, but pages of
memory are both sized and aligned: `mmap` returns a pointer aligned to
a page boundary, and requests for fixed mappings enforce this and will
fail if given a non-aligned offset.

But also, suppose that instead of one big read, I do something like:
`loop ... { seek(fd, 1, 1); read(fd, p + 1, 4093); p += 4093; }` to
copy into this region of memory I've mangled. Now you've got to deal
with access patterns that mix pre-existing data with data newly copied
from the file.  "Well, copy part of the file contents into a newly
allocated page..." might be an answer there, but that's not
substantially different than what `read` does today, so what's the
differentiator?

> This is when mmap starts to look appealing.

The key to making a good argument here is first acknowledging that the
whole model for working with data is just fundamentally different with
`mmap` as it is with `read`. You really can't treat them as the same.

Let me be blunt: the `mmap` interface, as specified in 4.2BSD and
implemented in a bunch of Unix and Unix-like systems, is atrocious.
Its roots come from a system that was radically different in design
than Unix, and its baroque design, with a bunch of operations
multiplexed onto a single call with 6 (!!) arguments, two of which are
bitmaps that interact in abstruse ways and one of which can radically
alter the semantics of the call, really shows. I believe that it _is_
possible to do better. But shoehorning the model of memory-mapped IO
into an overloaded `read` is not it.

        - Dan C.

> On Thursday, February 12, 2026, at 4:53 AM, Dan Cross wrote:
>
> The other [use] is to map the contents of a file into an address space, so 
> that you can treat them like memory, without first reading them from an 
> actual file. This is useful for large but sparse read-only data files: I 
> don't need to read the entire thing into physical memory; for that matter if 
> may not even fit into physical memory. But if I mmap it, and just copy the 
> bits I need, then those can be faulted into the address space on demand.
>
>
> So what I'm suggesting is that instead of the programmer making an mmap call, 
> they should make a single read call to read the entire file into the address 
> space - as they did before. The new read implementation would do this, but as 
> a memory mapped snapshot. This looks no different to the programmer from how 
> reads have always worked, it just happens very quickly, because no I/O 
> actually happens.
> The snapshot data is brought in by demand paging as it is touched, and pages 
> may get dirtied.
>
> When the programmer would otherwise call msync, they instead write out the 
> entire file back where it came from - as they did before. The write 
> implementation will recognise when it's overwriting the file where the 
> snapshot came from and will only write the dirty pages - which is effectively 
> what msync does.
>
> So from the programmer's point of view this is exactly what they've always 
> done. The implementation uses c-o-w snapshots and demand paging which have 
> the performance of mmap, but provide the conventional semantics of read and 
> write.
>
> Programs can handle larger files faster without having to change.
> It's just an optimisation in the read/write implementation.
>
> So that's the idea. Is it practical? I don't know... It's certainly harder to 
> do.
>
> One difference with mmap is that dirty pages don't get back to the file by 
> themselves. You have to do the writes. But I think there may be ways to 
> address this.
>
> On Thursday, February 12, 2026, at 4:53 AM, Dan Cross wrote:
>
> The problem is, those aren't the right analogues for the file metaphor. 
> `mmap` is closer to `open` than to `read`
>
> In the sense that mmap creates an association between pages and the file and 
> munmap undoes that, yes. With the idea above the page association is with 
> snapshots and is a bit more ephemeral, and I don't know yet how much it 
> matters if it persists after it's no longer needed. Pages are disassociated 
> from snapshots naturally by being dirtied, by being associated with something 
> else or perhaps by memory being deallocated. It may be somewhat like file 
> deletion. Sometimes when it's 'gone' it's not really gone until the last user 
> lets go. I don't think it's a problem for the process, but it may be for the 
> file system in some situations.
>
>
> 9fans / 9fans / see discussions + participants + delivery options Permalink

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Te8d7c6e48b5c075b-M67e7be4c741cd85745124418
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped

Reply via email to