Re: [Moses-support] loading time for large LMs

Jorg Tiedemann Tue, 12 Apr 2016 00:29:32 -0700

No, it’s definitely not waiting for input … the same setup works for smaller 
models.


I have the models on a work partition on our cluster.
This is probably not good enough and I will try to move data to local tmp on 
the individual nodes before executing.
Hopefully this helps. How would you do this if you want to distribute tuning?

Thanks!
Jörg





> On 12 Apr 2016, at 09:34, Ondrej Bojar <[email protected]> wrote:
> 
> Random suggestion: isn't it waiting for stdin for some strange reason? ;-)
> 
> O.
> 
> 
> On April 12, 2016 8:20:46 AM CEST, Hieu Hoang <[email protected] 
> <mailto:[email protected]>> wrote:
>> I assume that it's on local disk rather than a network drive.
>> 
>> Are you sure it's still in the loading stage, and that it's loading
>> kenlm,
>> rather than the pt or lexicalized reordering model etc?
>> 
>> If there's a way to make the model files available for download or to
>> give
>> me access your machine, i might be able to debug it
>> 
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>> On 12 Apr 2016 08:41, "Jorg Tiedemann" <[email protected]> wrote:
>> 
>>> 
>>> Unfortunately, load=read didn’t help. It’s been loading for 7 hours
>> now
>>> and no sign to start decoding.
>>> The disk is not terribly slow. cat worked without problem. I don’t
>> know
>>> what to do but I think that I have to give up for now.
>>> Am I the only one who is experiencing such slow loading times?
>>> 
>>> Thanks again for your help!
>>> 
>>> Jörg
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 10 Apr 2016, at 22:27, Kenneth Heafield <[email protected]>
>> wrote:
>>> 
>>> With load=read:
>>> 
>>> Act like normal RAM as part of the Moses process.
>>> 
>>> Supports huge pages via transparent huge pages, so it's slightly
>> faster.
>>> 
>>> Before loading cat file >/dev/null will just put things into cache
>> that
>>> were going to be read more or less like cat anyway.
>>> 
>>> After loading cat file >/dev/null will hurt since there's the
>> potential
>>> to load the file into RAM twice and swap out bits of Moses.
>>> 
>>> Memory is shared between threads, just not with the disk cache (ok
>>> maybe, but only if they get huge pages support to work well) or other
>>> processes that independently read the file.
>>> 
>>> With load=populate:
>>> 
>>> Load upfront, map it into the process, kernel seems to evict it
>> first.
>>> 
>>> Before loading cat file >/dev/null might help, but in theory
>>> MAP_POPULATE should be doing much the same thing.
>>> 
>>> After loading or during slow loading cat file >/dev/null can help
>>> because it forces the data back into RAM.  This is particularly
>> useful
>>> if the Moses process came under memory pressure after loading, which
>> can
>>> include heavy disk activity even if RAM isn't full.
>>> 
>>> Memory is shared with all other processes that mmap.
>>> 
>>> With load=lazy:
>>> 
>>> Map into the process with lazy loading (i.e. mmap without
>> MAP_POPULATE).
>>> Not recommended for decoding, but useful if you've got a 6 TB file
>> and
>>> want to send it a few 1000 queries.
>>> 
>>> cat will definitely help here at any time.
>>> 
>>> Memory is shared with all other processes that mmap.
>>> 
>>> On 04/10/2016 06:50 PM, Jorg Tiedemann wrote:
>>> 
>>> Thanks for the quick reply.
>>> I will try the load option.
>>> 
>>> Quick question: You said that the memory will not be shared across
>>> processes with that option. Does that mean that it will load the LM
>> for
>>> each thread? That would mean a lot in my setup.
>>> 
>>> By the way, I also did the cat >/dev/null thing but I didn’t have the
>>> impression that this changed a lot. Does it really help and how much
>>> would you usually gain? Thanks again!
>>> 
>>> 
>>> Jörg
>>> 
>>> 
>>> On 10 Apr 2016, at 12:55, Kenneth Heafield <[email protected]
>>> <mailto:[email protected] <mailto:[email protected]> 
>>> <[email protected] <mailto:[email protected]>>>> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm assuming you have enough RAM to fit everything.  The kernel seems
>>> to preferentially evict mmapped pages as memory usage approaches full
>>> (it doesn't have to be full).  To work around this, use
>>> 
>>> load=read
>>> 
>>> in your moses.ini line for the models.  REMOVE any "lazyken" argument
>>> which is deprecated and might override the load= argument.
>>> 
>>> The effect of load=read is to malloc (ok, anonymous mmap which is how
>>> malloc is implemented anyway) at a 1 GB aligned address (to optimize
>> for
>>> huge pages) and read() the file into that memory.  It will no longer
>>> share across processes, but memory will have the same swapiness as
>> the
>>> rest of the Moses process.
>>> 
>>> Lazy loading will only make things worse here.
>>> 
>>> Kenneth
>>> 
>>> On 04/10/2016 07:29 AM, Jorg Tiedemann wrote:
>>> 
>>> Hi,
>>> 
>>> I have a large language model from the common crawl data set and it
>>> takes forever to load when running moses.
>>> My model is a trigram kenlm binarized with quantization, trie
>> structures
>>> and pointer compression (-a 22 -q 8 -b 8).
>>> The model is about 140GB and it takes hours to load (I’m still
>> waiting).
>>> I run on a machine with 256GB RAM ...
>>> 
>>> I also tried lazy loading without success. Is this normal or do I do
>>> something wrong?
>>> Thanks for your help!
>>> 
>>> Jörg
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>
>>> <[email protected] <mailto:[email protected]>>>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>> 
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>
>>> <[email protected] <mailto:[email protected]>>>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected] <mailto:[email protected]>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>> 
>>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
> 
> -- 
> Ondrej Bojar (mailto:[email protected] <mailto:[email protected]> / 
> [email protected] <mailto:[email protected]>)
> http://www.cuni.cz/~obo <http://www.cuni.cz/~obo>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] loading time for large LMs

Reply via email to