Added load= option in 7a1baee, deprecating lazy=. Valid load= options
are the lowercase version of the enum shown below.
There's copies of the loading code in Backward.cpp and Reordering.h that
blames back to Lane. I've put ? : hacks in and hope he'll pay the cost
for code copying.
Kenneth
On 02/19/2016 11:38 PM, Kenneth Heafield wrote:
> Hi,
>
> The default is mmap with MAP_POPULATE (see man mmap). As to whether
> GPFS implements MAP_POPULATE correctly, I defer to the former IBM
> employee.
>
> KenLM implements the following options via config.load_method:
>
> typedef enum {
> // mmap with no prepopulate
> LAZY,
> // On linux, pass MAP_POPULATE to mmap.
> POPULATE_OR_LAZY,
> // Populate on Linux. malloc and read on non-Linux.
> POPULATE_OR_READ,
> // malloc and read.
> READ,
> // malloc and read in parallel (recommended for Lustre)
> PARALLEL_READ,
> } LoadMethod;
>
> However, Moses currently has "lazyken" as a true/false flag. false maps
> to POPULATE_OR_LAZY. true maps to LAZY. This should be refactored in
> moses/LM/Ken.cpp lines 503 and 538 to expose all the options in the enum.
>
> It's worth noting that the kernel preferentially evicts mmapped data
> under swap pressure, which is probably not the behavior you want for a
> network filesystem.
>
> Another thing to note is that huge page functionality with mmapped files
> is a mess on linux (you really have to be root and setup hugetlbfs).
> However, the malloc and read approaches are compatible with transparent
> huge pages (and my code even aligns to a 1 GB boundary now), so
> malloc+read results in faster queries.
>
> Kenneth
>
> On 02/19/2016 11:29 PM, Lane Schwartz wrote:
>> Hey,
>>
>> This is mostly addressed to Kenneth, since as far as I know he's the
>> author of the data structures involved.
>>
>> I have access to a cluster at the University of Illinois. The cluster
>> here uses GPFS as its file system.
>>
>> I've observed that when running moses, especially with lots of threads,
>> that the threads spend virtually all of their time at near 0% CPU usage,
>> in D (uninterruptible sleep, awaiting IO) status. When I copy my model
>> files and config file to scratch space on local disk (and cat $file >
>> /dev/null each model file), this issue disappears. It appears that doing
>> cat $file > /dev/null on GPFS does not load the file into RAM in the
>> same way that doing so appears to do on other file systems.
>>
>> I spent quite a bit of time today with three cluster admins / disk
>> engineers trying to debug this problem.
>>
>> Their ultimate solution was for me to cp each $file from GPFS to
>> /dev/shm, which as far as I can tell acts like a RAM disk. Doing so
>> resolves the issue.
>>
>> Their best estimate of the problem is that moses (from their
>> perspective) appeared to (for each thread) ask the file system for
>> access to data that's present in the model files, causing a new disk
>> read (with a corresponding disk lock) every time. They believe that this
>> issue is not present with local disk because the cat $file > /dev/null
>> is pre-loading each file into RAM in that case, but is not doing so with
>> GPFS. Thus the threads are (according to this theory) getting bogged
>> down by disk locks.
>>
>> I was puzzled by this, because I thought that the probing data structure
>> underlying the LM and the phrase table used memory mapping. I had
>> (perhaps naively) assumed that when the memory mapping is initiated, the
>> OS actively loaded all of the file contents into appropriate VM pages.
>> Now the question is, is the memory mapping actually acting lazily, only
>> loading data from disk on an as-needed basis? If so, that could
>> potentially explain the horrific disk delays that I'm encountering. And
>> if so, then one question is, is it possible to alter the behavior of the
>> memory mapping such that when the memory map is initiated, it actually
>> does active load the entire file into memory?
>>
>> Thanks,
>> Lane
>>
>>
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support