Sylvain Jeaugey wrote:
On Thu, 10 Jun 2010, Paul H. Hargrove wrote:
[snip]
As for why mmap is slower. When the file is on a real (not tmpfs or
other ramdisk) I am 95% certain that this is an artifact of the Linux
swapper/pager behavior which is thinking it is being smart by
"swapping ahead". Even when there is no memory pressure that
requires swapping, Linux starts queuing swap I/O for pages to keep
the number of "clean" pages up when possible. This results in pages
of the shared memory file being written out to the actual block
device. Both the background I/O and the VM metadata updates
contribute to the lost time. I say 95% certain because I have a
colleague who looked into this phenomena in another setting and I am
recounting what he reported as clearly as I can remember, but might
have misunderstood or inserted my own speculation by accident. A
sufficiently motivated investigator (not me) could probably devise an
experiment to verify this.
Interesting. Do you think this behavior of the linux kernel would
change if the file was unlink()ed after attach ?
Sylvain
As Jeff pointed out, the file IS unlinked by Open MPI, presumably to
ensure it is not left behind in case of abnormal termination.
This was also the case for the scenario I reported my colleague looking
at. We were (unpleasantly) surprised to find that this "swap ahead"
behavior was being applied to an unlinked file : a case that would
appear to be a very simple one to optimize away. However, the simple
fact is that Linux appears just to queue I/O to the "backing store" for
a page regardless of little details like it being unlinked.
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900