Sylvain Jeaugey wrote:
On Thu, 10 Jun 2010, Paul H. Hargrove wrote:
[snip]

As for why mmap is slower. When the file is on a real (not tmpfs or other ramdisk) I am 95% certain that this is an artifact of the Linux swapper/pager behavior which is thinking it is being smart by "swapping ahead". Even when there is no memory pressure that requires swapping, Linux starts queuing swap I/O for pages to keep the number of "clean" pages up when possible. This results in pages of the shared memory file being written out to the actual block device. Both the background I/O and the VM metadata updates contribute to the lost time. I say 95% certain because I have a colleague who looked into this phenomena in another setting and I am recounting what he reported as clearly as I can remember, but might have misunderstood or inserted my own speculation by accident. A sufficiently motivated investigator (not me) could probably devise an experiment to verify this.
Interesting. Do you think this behavior of the linux kernel would change if the file was unlink()ed after attach ?

Sylvain


As Jeff pointed out, the file IS unlinked by Open MPI, presumably to ensure it is not left behind in case of abnormal termination.

This was also the case for the scenario I reported my colleague looking at. We were (unpleasantly) surprised to find that this "swap ahead" behavior was being applied to an unlinked file : a case that would appear to be a very simple one to optimize away. However, the simple fact is that Linux appears just to queue I/O to the "backing store" for a page regardless of little details like it being unlinked.

-Paul

--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to