Re: Will mmap and the read buffer cache be unified, anyone working with it?

Howard Chu Wed, 11 Nov 2015 09:36:03 -0800

On 2015-11-10 14:10, Tinker wrote:
...

A "safe" approach to file access would be to read data using mmap()
but write data using fwrite() only. Mmap does have a read-only mode.
This does NOT work in OpenBSD currently though because of the absence
of unified caching.

I find this conversation puzzling, since even back in BSD 4.3, read() wasactually implemented by memory mapping the underlying file.

The nice thing about reading files from memory using mmap instead of
using fread(), is that you offload Lots of work to the OS kernel.

Suddenly file reading is free of mallocs for instance.

And the program doesn't need internal file caching, so the extent of the
OS' disk caching is increased. And I guess maybe the OS disk cache can
prioritize better what to keep in RAM.

In a way, mmap() is a way to "zero-copy file access", which is just
awesome.

A database that uses this technique is LMDB (OpenLDAP's default DB
backend).

A key feature of LMDB is that it's only 9600 locs,
https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c :

LMDB is interesting in how lowlevel it is, as it's written in C and the
"loaded" database entries are simply pointers into the mmap():ed space.

I saw some fantastic-looking benchmarks, I think at
http://symas.com/mdb/#bench , where LMDB goes light-speed where others
remain on the ground.

(LMDB has a limit in usability in that it never shrinks a DB file,
however, that is in no way because of its use of mmap() and could really
be overcome by working more on it.

That is not a limit in usability. Any active database undergoes insert anddelete operations; returning space to the OS on a delete would be foolishsince the DB will just request the space back again on the next insert operation.

High performance malloc libraries generally operate the same way - once theyhave acquired memory from the OS they seldom/never return it. There's no goodreason to incur the cost of allocation more than once.

Also, LMDB serializes its DB writes, which also is an architecture
decision specific to it and which has severe performance implications -
and that is unspecific to mmap() also, and could be overcome.)

LMDB's write performance is pretty mediocre, by design - we emphasizeddurability/reliability over performance here. But in practice, it is alwaysfaster than e.g. BerkeleyDB, which supports multiple concurrent writers. Withmultiple writer concurrency, we found that BDB spends much of its time incontended locks and deadlock resolution. In most applications, lockacquisition/release, deadlock detection, and resolution will consume a hugeamount of CPU time, completely erasing any potential throughput gains fromallowing multiple concurrent writers.

If you want to do writes thru mmap() then you need to be extremely careful, soyes, how LMDB does writes is actually highly specific to its use of mmap.Transactional integrity requires that certain writes are persisted to disk ina particular order, otherwise you get corrupted data structures. You can usemlock() to prevent pages from being flushed before you intend, but then you'reinvoking a number of system calls per write, and so you haven't gainedanything in the performance department. Or you can do what LMDB does, andwrite arbitrarily to the map until the end of the transaction (using nosyscalls), and then do carefully sequenced final updates and msyncs.

Note that LMDB works perfectly well on OpenBSD even without a unified buffercache; it just requires you to perform writes thru the mmap as well as readsto sidestep the cache coherency issue. (Of course, using a writable mmap meansyou lose LMDB's default immunity to stray writes thru wild pointers.)


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: Will mmap and the read buffer cache be unified, anyone working with it?

Reply via email to