I can confirm that using NFS doesn't suffer from the disk wait problem,
at least the implementation used on the Edinburgh servers.
Using memory mapped files on NFS or local disk gives the same speed
performance
On 19/02/16 23:29, Lane Schwartz wrote:
Hey,
This is mostly addressed to Kenneth, since as far as I know he's the
author of the data structures involved.
I have access to a cluster at the University of Illinois. The cluster
here uses GPFS as its file system.
I've observed that when running moses, especially with lots of
threads, that the threads spend virtually all of their time at near 0%
CPU usage, in D (uninterruptible sleep, awaiting IO) status. When I
copy my model files and config file to scratch space on local disk
(and cat $file > /dev/null each model file), this issue disappears. It
appears that doing cat $file > /dev/null on GPFS does not load the
file into RAM in the same way that doing so appears to do on other
file systems.
I spent quite a bit of time today with three cluster admins / disk
engineers trying to debug this problem.
Their ultimate solution was for me to cp each $file from GPFS to
/dev/shm, which as far as I can tell acts like a RAM disk. Doing so
resolves the issue.
Their best estimate of the problem is that moses (from their
perspective) appeared to (for each thread) ask the file system for
access to data that's present in the model files, causing a new disk
read (with a corresponding disk lock) every time. They believe that
this issue is not present with local disk because the cat $file >
/dev/null is pre-loading each file into RAM in that case, but is not
doing so with GPFS. Thus the threads are (according to this theory)
getting bogged down by disk locks.
I was puzzled by this, because I thought that the probing data
structure underlying the LM and the phrase table used memory mapping.
I had (perhaps naively) assumed that when the memory mapping is
initiated, the OS actively loaded all of the file contents into
appropriate VM pages. Now the question is, is the memory mapping
actually acting lazily, only loading data from disk on an as-needed
basis? If so, that could potentially explain the horrific disk delays
that I'm encountering. And if so, then one question is, is it possible
to alter the behavior of the memory mapping such that when the memory
map is initiated, it actually does active load the entire file into
memory?
Thanks,
Lane
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support