Well, this is on a shared login node and maybe not very representative for 
other nodes in the cluster.
I can see if I can get a more representative figure.
But it’s quite busy on our cluster right now ….


All the best,
Jörg


Jörg Tiedemann
[email protected]






> On 12 Apr 2016, at 14:54, Kenneth Heafield <[email protected]> wrote:
> 
> Hi,
> 
>       Why is your system using 7 GB of swap out of 9 GB?  Moses is only
> taking 147 GB out of 252 GB physical.  I smell other processes taking up
> RAM, possibly those 5 stopped and 1 zombie.
> 
> Kenneth
> 
> On 04/12/2016 12:45 PM, Jorg Tiedemann wrote:
>> 
>>> 
>>> Did you remove all "lazyken" arguments from moses.ini?
>> 
>> Yes, I did.
>> 
>>> 
>>> Is the network filesystem Lustre?  If so, mmap will perform terribly and
>>> you should use load=read or (better) load=parallel_read since reading
>>> from Lustre is CPU-bound.
>>> 
>> 
>> Yes, I think so. Interesting with the parallel_read option. Can this
>> hurt for some setups or could I use this as my standard?
>> 
>> 
>>> Does the cluster management software/job scheduler/sysadmin impose a
>>> resident memory limit?
>>> 
>> 
>> I don’t really know. I don’t really think so but I need to find out
>> 
>> 
>>> Can you copy-paste `top' when it's running slow and the stderr at that
>>> time?
>> 
>> 
>> Here is top of my top when running on my test node:
>> 
>> top - 14:39:03 up 50 days,  5:47,  0 users,  load average: 1.97, 2.09, 3.85
>> Tasks: 814 total,   3 running, 805 sleeping,   5 stopped,   1 zombie
>> Cpu(s):  6.9%us,  6.2%sy,  0.0%ni, 86.9%id,  0.0%wa,  0.0%hi,  0.0%si, 
>> 0.0%st
>> Mem:  264493500k total, 263614188k used,   879312k free,    68680k buffers
>> Swap:  9775548k total,  7198920k used,  2576628k free, 69531796k cached
>> 
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 42528 tiedeman  20   0  147g 147g  800 R 100.0 58.4  31:25.01 moses
>> 
>> stderr doesn’t say anything new besides of the message from starting the
>> feature function loading
>> 
>> FeatureFunction: LM0 start: 16 end: 16
>> line=KENLM load=parallel_read name=LM1 factor=0
>> path=/homeappl/home/tiedeman/research/SMT/wmt16/fi-en/data/monolingual/cc.tok.3.en.trie.kenlm
>> order=3
>> 
>> 
>> I try with /tmp/ now as well (it takes time to shuffle around the big
>> files though).
>> 
>> Jörg
>> 
>> 
>>> 
>>> On 04/12/2016 08:26 AM, Jorg Tiedemann wrote:
>>>> 
>>>> No, it’s definitely not waiting for input … the same setup works for
>>>> smaller models.
>>>> 
>>>> I have the models on a work partition on our cluster.
>>>> This is probably not good enough and I will try to move data to local
>>>> tmp on the individual nodes before executing.
>>>> Hopefully this helps. How would you do this if you want to distribute
>>>> tuning?
>>>> 
>>>> Thanks!
>>>> Jörg
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 12 Apr 2016, at 09:34, Ondrej Bojar <[email protected]
>>>>> <mailto:[email protected]>
>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>> 
>>>>> Random suggestion: isn't it waiting for stdin for some strange
>>>>> reason? ;-)
>>>>> 
>>>>> O.
>>>>> 
>>>>> 
>>>>> On April 12, 2016 8:20:46 AM CEST, Hieu Hoang <[email protected] 
>>>>> <mailto:[email protected]>
>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>>> I assume that it's on local disk rather than a network drive.
>>>>>> 
>>>>>> Are you sure it's still in the loading stage, and that it's loading
>>>>>> kenlm,
>>>>>> rather than the pt or lexicalized reordering model etc?
>>>>>> 
>>>>>> If there's a way to make the model files available for download or to
>>>>>> give
>>>>>> me access your machine, i might be able to debug it
>>>>>> 
>>>>>> Hieu Hoang
>>>>>> http://www.hoang.co.uk/hieu <http://www.hoang.co.uk/hieu>
>>>>>> On 12 Apr 2016 08:41, "Jorg Tiedemann" <[email protected] 
>>>>>> <mailto:[email protected]>
>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Unfortunately, load=read didn’t help. It’s been loading for 7 hours
>>>>>> now
>>>>>>> and no sign to start decoding.
>>>>>>> The disk is not terribly slow. cat worked without problem. I don’t
>>>>>> know
>>>>>>> what to do but I think that I have to give up for now.
>>>>>>> Am I the only one who is experiencing such slow loading times?
>>>>>>> 
>>>>>>> Thanks again for your help!
>>>>>>> 
>>>>>>> Jörg
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 10 Apr 2016, at 22:27, Kenneth Heafield <[email protected] 
>>>>>>> <mailto:[email protected]>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>
>>>>>> wrote:
>>>>>>> 
>>>>>>> With load=read:
>>>>>>> 
>>>>>>> Act like normal RAM as part of the Moses process.
>>>>>>> 
>>>>>>> Supports huge pages via transparent huge pages, so it's slightly
>>>>>> faster.
>>>>>>> 
>>>>>>> Before loading cat file >/dev/null will just put things into cache
>>>>>> that
>>>>>>> were going to be read more or less like cat anyway.
>>>>>>> 
>>>>>>> After loading cat file >/dev/null will hurt since there's the
>>>>>> potential
>>>>>>> to load the file into RAM twice and swap out bits of Moses.
>>>>>>> 
>>>>>>> Memory is shared between threads, just not with the disk cache (ok
>>>>>>> maybe, but only if they get huge pages support to work well) or other
>>>>>>> processes that independently read the file.
>>>>>>> 
>>>>>>> With load=populate:
>>>>>>> 
>>>>>>> Load upfront, map it into the process, kernel seems to evict it
>>>>>> first.
>>>>>>> 
>>>>>>> Before loading cat file >/dev/null might help, but in theory
>>>>>>> MAP_POPULATE should be doing much the same thing.
>>>>>>> 
>>>>>>> After loading or during slow loading cat file >/dev/null can help
>>>>>>> because it forces the data back into RAM.  This is particularly
>>>>>> useful
>>>>>>> if the Moses process came under memory pressure after loading, which
>>>>>> can
>>>>>>> include heavy disk activity even if RAM isn't full.
>>>>>>> 
>>>>>>> Memory is shared with all other processes that mmap.
>>>>>>> 
>>>>>>> With load=lazy:
>>>>>>> 
>>>>>>> Map into the process with lazy loading (i.e. mmap without
>>>>>> MAP_POPULATE).
>>>>>>> Not recommended for decoding, but useful if you've got a 6 TB file
>>>>>> and
>>>>>>> want to send it a few 1000 queries.
>>>>>>> 
>>>>>>> cat will definitely help here at any time.
>>>>>>> 
>>>>>>> Memory is shared with all other processes that mmap.
>>>>>>> 
>>>>>>> On 04/10/2016 06:50 PM, Jorg Tiedemann wrote:
>>>>>>> 
>>>>>>> Thanks for the quick reply.
>>>>>>> I will try the load option.
>>>>>>> 
>>>>>>> Quick question: You said that the memory will not be shared across
>>>>>>> processes with that option. Does that mean that it will load the LM
>>>>>> for
>>>>>>> each thread? That would mean a lot in my setup.
>>>>>>> 
>>>>>>> By the way, I also did the cat >/dev/null thing but I didn’t have the
>>>>>>> impression that this changed a lot. Does it really help and how much
>>>>>>> would you usually gain? Thanks again!
>>>>>>> 
>>>>>>> 
>>>>>>> Jörg
>>>>>>> 
>>>>>>> 
>>>>>>> On 10 Apr 2016, at 12:55, Kenneth Heafield <[email protected] 
>>>>>>> <mailto:[email protected]>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>>>> <mailto:[email protected] <mailto:[email protected]> 
>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm assuming you have enough RAM to fit everything.  The kernel seems
>>>>>>> to preferentially evict mmapped pages as memory usage approaches full
>>>>>>> (it doesn't have to be full).  To work around this, use
>>>>>>> 
>>>>>>> load=read
>>>>>>> 
>>>>>>> in your moses.ini line for the models.  REMOVE any "lazyken" argument
>>>>>>> which is deprecated and might override the load= argument.
>>>>>>> 
>>>>>>> The effect of load=read is to malloc (ok, anonymous mmap which is how
>>>>>>> malloc is implemented anyway) at a 1 GB aligned address (to optimize
>>>>>> for
>>>>>>> huge pages) and read() the file into that memory.  It will no longer
>>>>>>> share across processes, but memory will have the same swapiness as
>>>>>> the
>>>>>>> rest of the Moses process.
>>>>>>> 
>>>>>>> Lazy loading will only make things worse here.
>>>>>>> 
>>>>>>> Kenneth
>>>>>>> 
>>>>>>> On 04/10/2016 07:29 AM, Jorg Tiedemann wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I have a large language model from the common crawl data set and it
>>>>>>> takes forever to load when running moses.
>>>>>>> My model is a trigram kenlm binarized with quantization, trie
>>>>>> structures
>>>>>>> and pointer compression (-a 22 -q 8 -b 8).
>>>>>>> The model is about 140GB and it takes hours to load (I’m still
>>>>>> waiting).
>>>>>>> I run on a machine with 256GB RAM ...
>>>>>>> 
>>>>>>> I also tried lazy loading without success. Is this normal or do I do
>>>>>>> something wrong?
>>>>>>> Thanks for your help!
>>>>>>> 
>>>>>>> Jörg
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected] <mailto:[email protected]> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>
>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected] <mailto:[email protected]> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>
>>>>>>> <[email protected] <mailto:[email protected]>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ------------------------------------------------------------------------
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>>> 
>>>>> -- 
>>>>> Ondrej Bojar (mailto:[email protected] <mailto:[email protected]> / 
>>>>> [email protected] <mailto:[email protected]>
>>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>>> <mailto:[email protected] <mailto:[email protected]>>)
>>>>> http://www.cuni.cz/~obo <http://www.cuni.cz/~obo>
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected] <mailto:[email protected]> 
>>>> <mailto:[email protected] <mailto:[email protected]>>
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>> 
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>
>>> http://mailman.mit.edu/mailman/listinfo/moses-support 
>>> <http://mailman.mit.edu/mailman/listinfo/moses-support>
>> 
> _______________________________________________
> Moses-support mailing list
> [email protected] <mailto:[email protected]>
> http://mailman.mit.edu/mailman/listinfo/moses-support 
> <http://mailman.mit.edu/mailman/listinfo/moses-support>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to