Hey,

This is mostly addressed to Kenneth, since as far as I know he's the author
of the data structures involved.

I have access to a cluster at the University of Illinois. The cluster here
uses GPFS as its file system.

I've observed that when running moses, especially with lots of threads,
that the threads spend virtually all of their time at near 0% CPU usage, in
D (uninterruptible sleep, awaiting IO) status. When I copy my model files
and config file to scratch space on local disk (and cat $file > /dev/null
each model file), this issue disappears. It appears that doing cat $file >
/dev/null on GPFS does not load the file into RAM in the same way that
doing so appears to do on other file systems.

I spent quite a bit of time today with three cluster admins / disk
engineers trying to debug this problem.

Their ultimate solution was for me to cp each $file from GPFS to /dev/shm,
which as far as I can tell acts like a RAM disk. Doing so resolves the
issue.

Their best estimate of the problem is that moses (from their perspective)
appeared to (for each thread) ask the file system for access to data that's
present in the model files, causing a new disk read (with a corresponding
disk lock) every time. They believe that this issue is not present with
local disk because the cat $file > /dev/null is pre-loading each file into
RAM in that case, but is not doing so with GPFS. Thus the threads are
(according to this theory) getting bogged down by disk locks.

I was puzzled by this, because I thought that the probing data structure
underlying the LM and the phrase table used memory mapping. I had (perhaps
naively) assumed that when the memory mapping is initiated, the OS actively
loaded all of the file contents into appropriate VM pages. Now the question
is, is the memory mapping actually acting lazily, only loading data from
disk on an as-needed basis? If so, that could potentially explain the
horrific disk delays that I'm encountering. And if so, then one question
is, is it possible to alter the behavior of the memory mapping such that
when the memory map is initiated, it actually does active load the entire
file into memory?

Thanks,
Lane
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to