On 7/26/16 3:30 PM, Charles Hixson via Digitalmars-d-learn wrote:
On 07/26/2016 11:31 AM, Steven Schveighoffer via Digitalmars-d-learn wrote:

Now, C i/o's buffering may not suit your exact needs. So I don't know
how it will perform. You may want to consider mmap which tells the
kernel to link pages of memory directly to disk access. Then the
kernel is doing all the buffering for you. Phobos has support for it,
but it's pretty minimal from what I can see:
http://dlang.org/phobos/std_mmfile.html

I've considered mmapfile often, but when I read the documentation I end
up realizing that I don't understand it.  So I look up memory mapped
files in other places, and I still don't understand it.  It looks as if
the entire file is stored in memory, which is not at all what I want,
but I also can't really believe that's what's going on.

Of course that isn't what is happening :)

What happens is that the kernel says memory page 0x12345 (or whatever) is mapped to the file. Then when you access a mapped page, the system memory management unit gets a page fault (because that memory isn't loaded), which triggers the kernel to load that page of memory. Kernel sees that the memory is really mapped to that file, and loads the page from the file instead. As you write to the memory location, the page is marked dirty, and at some point, the kernel flushes that page back to disk.

Everything is done behind the scenes and is in tune with the filesystem itself, so you get a little extra benefit from that.

I know that
there was an early form of this in a version of BASIC (the version that
RISS was written in, but I don't remember which version that was) and in
*that* version array elements were read in as needed.  (It wasn't
spectacularly efficient.)  But memory mapped files don't seem to work
that way, because people keep talking about how efficient they are.  Do
you know a good introductory tutorial?  I'm guessing that "window size"
might refer to the number of bytes available, but what if you need to
append to the file?  Etc.

To be honest, I'm not super familiar with actually using them, I just have a rough idea of how they work. The actual usage you will have to look up.

A part of the problem is that I don't want this to be a process with an
arbitrarily high memory use.

You should know that you can allocate as much memory as you want, as long as you have address space for it, and you won't actually map that to physical memory until you use it. So the management of the memory is done lazily, all supported by the MMU hardware. This is true for actual memory too!

Note that the only "memory" you are using for the mmaped file are page buffers in the kernel which are likely already being used to buffer the disk reads. It's not like it's loading the entire file into memory, and probably doesn't even load all sequential pages into memory. It only loads the ones you use.

I'm pretty much at my limit for knowledge of this subject (and maybe I have a few things incorrect), I'm sure others here know much more. I suggest you play a bit with it to see what the performance is like. I have also heard that it's very fast.

-Steve

Reply via email to