On Thursday, February 12, 2026, at 4:53 AM, Dan Cross wrote:
> On Wed, Feb 11, 2026 at 11:08 PM Alyssa M via 9fans <[email protected]> wrote: 
>> On Wednesday, February 11, 2026, at 10:01 AM, hiro wrote: 
>>> what concrete problem are you trying to solve?
>> Making software simpler to write, I think. 
> I don't understand that.  If the interface doesn't change, how is it simpler?

Think of a program that reads a file completely into memory, pokes at it a bit 
sparsely then writes the whole file out again. This is simple if the file is 
small.
If the file gets big, you might start looking around for ways to not do all 
that I/O, and pretty soon you have a buffer cache implementation. So the 
program is now more complex. Not only is there a buffer cache implementation, 
but you have to use it everywhere, rather than just operating on memory. 

This is when mmap starts to look appealing.

On Thursday, February 12, 2026, at 4:53 AM, Dan Cross wrote:
> The other
[use] is to map the contents of a file into an address space, so that you
can treat them like memory, without first reading them from an actual
file. This is useful for large but sparse read-only data files: I
don't need to read the entire thing into physical memory; for that
matter if may not even fit into physical memory. But if I mmap it, and
just copy the bits I need, then those can be faulted into the address
space on demand.

So what I'm suggesting is that instead of the programmer making an mmap call, 
they should make a single read call to read the entire file into the address 
space - as they did before. The new read implementation would do this, but as a 
memory mapped snapshot. This looks no different to the programmer from how 
reads have always worked, it just happens very quickly, because no I/O actually 
happens.
The snapshot data is brought in by demand paging as it is touched, and pages 
may get dirtied.

When the programmer would otherwise call msync, they instead write out the 
entire file back where it came from - as they did before. The write 
implementation will recognise when it's overwriting the file where the snapshot 
came from and will only write the dirty pages - which is effectively what msync 
does. 

So from the programmer's point of view this is exactly what they've always 
done. The implementation uses c-o-w snapshots and demand paging which have the 
performance of mmap, but provide the conventional semantics of read and write.

Programs can handle larger files faster without having to change.
It's just an optimisation in the read/write implementation.

So that's the idea. Is it practical? I don't know... It's certainly harder to 
do.

One difference with mmap is that dirty pages don't get back to the file by 
themselves. You have to do the writes. But I think there may be ways to address 
this.

On Thursday, February 12, 2026, at 4:53 AM, Dan Cross wrote:
> The problem is, those aren't the right analogues for the file
metaphor.  `mmap` is closer to `open` than to `read`
In the sense that mmap creates an association between pages and the file and 
munmap undoes that, yes. With the idea above the page association is with 
snapshots and is a bit more ephemeral, and I don't know yet how much it matters 
if it persists after it's no longer needed. Pages are disassociated from 
snapshots naturally by being dirtied, by being associated with something else 
or perhaps by memory being deallocated. It may be somewhat like file deletion. 
Sometimes when it's 'gone' it's not really gone until the last user lets go. I 
don't think it's a problem for the process, but it may be for the file system 
in some situations.


------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Te8d7c6e48b5c075b-M8b80dba1c12ac630dda63f5c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to