Hey, This is mostly addressed to Kenneth, since as far as I know he's the author of the data structures involved.
I have access to a cluster at the University of Illinois. The cluster here uses GPFS as its file system. I've observed that when running moses, especially with lots of threads, that the threads spend virtually all of their time at near 0% CPU usage, in D (uninterruptible sleep, awaiting IO) status. When I copy my model files and config file to scratch space on local disk (and cat $file > /dev/null each model file), this issue disappears. It appears that doing cat $file > /dev/null on GPFS does not load the file into RAM in the same way that doing so appears to do on other file systems. I spent quite a bit of time today with three cluster admins / disk engineers trying to debug this problem. Their ultimate solution was for me to cp each $file from GPFS to /dev/shm, which as far as I can tell acts like a RAM disk. Doing so resolves the issue. Their best estimate of the problem is that moses (from their perspective) appeared to (for each thread) ask the file system for access to data that's present in the model files, causing a new disk read (with a corresponding disk lock) every time. They believe that this issue is not present with local disk because the cat $file > /dev/null is pre-loading each file into RAM in that case, but is not doing so with GPFS. Thus the threads are (according to this theory) getting bogged down by disk locks. I was puzzled by this, because I thought that the probing data structure underlying the LM and the phrase table used memory mapping. I had (perhaps naively) assumed that when the memory mapping is initiated, the OS actively loaded all of the file contents into appropriate VM pages. Now the question is, is the memory mapping actually acting lazily, only loading data from disk on an as-needed basis? If so, that could potentially explain the horrific disk delays that I'm encountering. And if so, then one question is, is it possible to alter the behavior of the memory mapping such that when the memory map is initiated, it actually does active load the entire file into memory? Thanks, Lane
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
