Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped

Dan Cross Sat, 14 Feb 2026 20:30:26 -0800

On Sat, Feb 14, 2026 at 2:15 AM Alyssa M via 9fans <[email protected]> wrote:
> On Friday, February 13, 2026, at 1:36 AM, Dan Cross wrote:
> > Let me be blunt: the `mmap` interface, as specified in 4.2BSD
> > and implemented in a bunch of Unix and Unix-like systems, is
> > atrocious. Its roots come from a system that was radically
> > different in design than Unix, and its baroque design, with a
> > bunch of operations multiplexed onto a single call with 6 (!!)
> > arguments, two of which are bitmaps that interact in abstruse
> > ways and one of which can radically alter the semantics of
> > the call, really shows. I believe that it _is_ possible to do better.
>
> I'm definitely with you there.
>
> What I'm trying to do is explore whether the standard read/write
> calls with a different implementation could make mmap redundant
> by using the combination of demand paging and copy-on-write to
> defer or elide I/O. This is not mmap, and not really memory
> mapping in the conventional sense either. But I think it can do the
> things an mmap user is looking for (so long as they're not looking
> for frame buffers or shared memory.) A successful implementation
> will not be detectably different from the traditional read/write calls
> except with respect to deferred or elided I/O.


But as before, I don't see how you intend to make that work in a way
that is transparent to actual programmers. The semantics of read/write
and memory mapping files are just too different.

> The key ideas are that the read (and write!) system calls associate
> buffer pages with a snapshot of the file data involved (which is then
> demand loaded - deferred), and altering memory pages breaks that
> association, and associates the altered pages with the swap file.

Except they don't.

It sounds like you're thinking in terms of pages, I get that; but
read/write work in terms of byte buffers that have no obligation to be
byte aligned. Put another way, read and write relate the contents of a
"file" with an arbitrarily sized and aligned byte-buffer in memory,
but there is no obligation that those byte buffers have the properties
required to be a "page" in the virtual memory sense.

As in my earlier email, consider the case of a `read` into some long
buffer that is offset relative to a page boundary. For example,
assuming a 4KiB page:

    char *p = 0xffffffff00000000; // Starts at 4GiB.
    read(fd, p + 2, 0x1000*1024*128);

Here, a program is reading 512MiB of data into a buffer, but that
buffer doesn't start on a page boundary; so the contents of each
logical "page" read from the file are offset from the start of a page
a la the virtual memory system. Because of that, the system can't play
clever games with page mapping the read contents anymore: the virtual
memory hardware enforces the property that pages are _aligned_ to the
page size, and the destination of this read are not aligned to that
alignment requirement.

> Pages that are still associated with the file region they're being written
> to are not written (this is the eliding part). Fragments of pages are
> treated traditionally. The segment this happens in may be logically
> much larger even than the swap area - and not is pre-allocated, so
> pages are either allocated swap space on demand, or are associated
> with snapshots.

This still doesn't make any sense to me.  The _program_ has already
allocated the memory it reads into; how the system gets the data into
that memory in response to a read is sort of irrelevant.  In any
event, you still have to deal with the problem of destination buffers
that aren't page aligned, which is already going to make this
unworkable in the general case.  You could special-case requests for
page aligned reads, I guess, but there are a lot of corner cases: what
if I `malloc` a buffer, align a pointer to a page boundary at some
offset into the malloc'ed region, `read`, and then immediately `free`
the buffer?  Now `free` needs to know to tell the system that this
region was dealloc'ed; if it doesn't, then a subsequent `malloc`
covering part of it could cause a lot of IO for no good reason, as I
go to write bits of that memory and they're CoW faulted into the
process's address space.  `free` usually doesn't care; it's a
userspace library, and sure, some implementations do interact with the
system more than others, but now I'm essentially requiring that.

In short, I don't see how this decreases complexity; it seems to
objectively increase it.

> There's much more I could say, but I don't have it all working yet, and
> making it work at scale is probably another story.

Well, I wish you luck, and if you come up with something amazing, I'll
be the first to admit that I'm wrong.  But it sure seems like the
practicality of the idea is predicated on assumptions that don't hold
in the vast majority of cases.

Again, I think there's space for something like memory-mapped files in
the wider universe of systems design, but I'm afraid you'll find that
trying to make that thing `read` is not workable.

        - Dan C.

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Te8d7c6e48b5c075b-M399decded15386d15d15b6c7
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Re: mmaping on plan9? (was Re: [9fans] venti /plan9port mmapped

Reply via email to