Well, this is on a shared login node and maybe not very representative for other nodes in the cluster. I can see if I can get a more representative figure. But it’s quite busy on our cluster right now ….
All the best, Jörg Jörg Tiedemann [email protected] > On 12 Apr 2016, at 14:54, Kenneth Heafield <[email protected]> wrote: > > Hi, > > Why is your system using 7 GB of swap out of 9 GB? Moses is only > taking 147 GB out of 252 GB physical. I smell other processes taking up > RAM, possibly those 5 stopped and 1 zombie. > > Kenneth > > On 04/12/2016 12:45 PM, Jorg Tiedemann wrote: >> >>> >>> Did you remove all "lazyken" arguments from moses.ini? >> >> Yes, I did. >> >>> >>> Is the network filesystem Lustre? If so, mmap will perform terribly and >>> you should use load=read or (better) load=parallel_read since reading >>> from Lustre is CPU-bound. >>> >> >> Yes, I think so. Interesting with the parallel_read option. Can this >> hurt for some setups or could I use this as my standard? >> >> >>> Does the cluster management software/job scheduler/sysadmin impose a >>> resident memory limit? >>> >> >> I don’t really know. I don’t really think so but I need to find out >> >> >>> Can you copy-paste `top' when it's running slow and the stderr at that >>> time? >> >> >> Here is top of my top when running on my test node: >> >> top - 14:39:03 up 50 days, 5:47, 0 users, load average: 1.97, 2.09, 3.85 >> Tasks: 814 total, 3 running, 805 sleeping, 5 stopped, 1 zombie >> Cpu(s): 6.9%us, 6.2%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 264493500k total, 263614188k used, 879312k free, 68680k buffers >> Swap: 9775548k total, 7198920k used, 2576628k free, 69531796k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 42528 tiedeman 20 0 147g 147g 800 R 100.0 58.4 31:25.01 moses >> >> stderr doesn’t say anything new besides of the message from starting the >> feature function loading >> >> FeatureFunction: LM0 start: 16 end: 16 >> line=KENLM load=parallel_read name=LM1 factor=0 >> path=/homeappl/home/tiedeman/research/SMT/wmt16/fi-en/data/monolingual/cc.tok.3.en.trie.kenlm >> order=3 >> >> >> I try with /tmp/ now as well (it takes time to shuffle around the big >> files though). >> >> Jörg >> >> >>> >>> On 04/12/2016 08:26 AM, Jorg Tiedemann wrote: >>>> >>>> No, it’s definitely not waiting for input … the same setup works for >>>> smaller models. >>>> >>>> I have the models on a work partition on our cluster. >>>> This is probably not good enough and I will try to move data to local >>>> tmp on the individual nodes before executing. >>>> Hopefully this helps. How would you do this if you want to distribute >>>> tuning? >>>> >>>> Thanks! >>>> Jörg >>>> >>>> >>>> >>>> >>>> >>>>> On 12 Apr 2016, at 09:34, Ondrej Bojar <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>> >>>>> Random suggestion: isn't it waiting for stdin for some strange >>>>> reason? ;-) >>>>> >>>>> O. >>>>> >>>>> >>>>> On April 12, 2016 8:20:46 AM CEST, Hieu Hoang <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>>> I assume that it's on local disk rather than a network drive. >>>>>> >>>>>> Are you sure it's still in the loading stage, and that it's loading >>>>>> kenlm, >>>>>> rather than the pt or lexicalized reordering model etc? >>>>>> >>>>>> If there's a way to make the model files available for download or to >>>>>> give >>>>>> me access your machine, i might be able to debug it >>>>>> >>>>>> Hieu Hoang >>>>>> http://www.hoang.co.uk/hieu <http://www.hoang.co.uk/hieu> >>>>>> On 12 Apr 2016 08:41, "Jorg Tiedemann" <[email protected] >>>>>> <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>>> >>>>>>> >>>>>>> Unfortunately, load=read didn’t help. It’s been loading for 7 hours >>>>>> now >>>>>>> and no sign to start decoding. >>>>>>> The disk is not terribly slow. cat worked without problem. I don’t >>>>>> know >>>>>>> what to do but I think that I have to give up for now. >>>>>>> Am I the only one who is experiencing such slow loading times? >>>>>>> >>>>>>> Thanks again for your help! >>>>>>> >>>>>>> Jörg >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 10 Apr 2016, at 22:27, Kenneth Heafield <[email protected] >>>>>>> <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>>> >>>>>> wrote: >>>>>>> >>>>>>> With load=read: >>>>>>> >>>>>>> Act like normal RAM as part of the Moses process. >>>>>>> >>>>>>> Supports huge pages via transparent huge pages, so it's slightly >>>>>> faster. >>>>>>> >>>>>>> Before loading cat file >/dev/null will just put things into cache >>>>>> that >>>>>>> were going to be read more or less like cat anyway. >>>>>>> >>>>>>> After loading cat file >/dev/null will hurt since there's the >>>>>> potential >>>>>>> to load the file into RAM twice and swap out bits of Moses. >>>>>>> >>>>>>> Memory is shared between threads, just not with the disk cache (ok >>>>>>> maybe, but only if they get huge pages support to work well) or other >>>>>>> processes that independently read the file. >>>>>>> >>>>>>> With load=populate: >>>>>>> >>>>>>> Load upfront, map it into the process, kernel seems to evict it >>>>>> first. >>>>>>> >>>>>>> Before loading cat file >/dev/null might help, but in theory >>>>>>> MAP_POPULATE should be doing much the same thing. >>>>>>> >>>>>>> After loading or during slow loading cat file >/dev/null can help >>>>>>> because it forces the data back into RAM. This is particularly >>>>>> useful >>>>>>> if the Moses process came under memory pressure after loading, which >>>>>> can >>>>>>> include heavy disk activity even if RAM isn't full. >>>>>>> >>>>>>> Memory is shared with all other processes that mmap. >>>>>>> >>>>>>> With load=lazy: >>>>>>> >>>>>>> Map into the process with lazy loading (i.e. mmap without >>>>>> MAP_POPULATE). >>>>>>> Not recommended for decoding, but useful if you've got a 6 TB file >>>>>> and >>>>>>> want to send it a few 1000 queries. >>>>>>> >>>>>>> cat will definitely help here at any time. >>>>>>> >>>>>>> Memory is shared with all other processes that mmap. >>>>>>> >>>>>>> On 04/10/2016 06:50 PM, Jorg Tiedemann wrote: >>>>>>> >>>>>>> Thanks for the quick reply. >>>>>>> I will try the load option. >>>>>>> >>>>>>> Quick question: You said that the memory will not be shared across >>>>>>> processes with that option. Does that mean that it will load the LM >>>>>> for >>>>>>> each thread? That would mean a lot in my setup. >>>>>>> >>>>>>> By the way, I also did the cat >/dev/null thing but I didn’t have the >>>>>>> impression that this changed a lot. Does it really help and how much >>>>>>> would you usually gain? Thanks again! >>>>>>> >>>>>>> >>>>>>> Jörg >>>>>>> >>>>>>> >>>>>>> On 10 Apr 2016, at 12:55, Kenneth Heafield <[email protected] >>>>>>> <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]> >>>>>>> <[email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm assuming you have enough RAM to fit everything. The kernel seems >>>>>>> to preferentially evict mmapped pages as memory usage approaches full >>>>>>> (it doesn't have to be full). To work around this, use >>>>>>> >>>>>>> load=read >>>>>>> >>>>>>> in your moses.ini line for the models. REMOVE any "lazyken" argument >>>>>>> which is deprecated and might override the load= argument. >>>>>>> >>>>>>> The effect of load=read is to malloc (ok, anonymous mmap which is how >>>>>>> malloc is implemented anyway) at a 1 GB aligned address (to optimize >>>>>> for >>>>>>> huge pages) and read() the file into that memory. It will no longer >>>>>>> share across processes, but memory will have the same swapiness as >>>>>> the >>>>>>> rest of the Moses process. >>>>>>> >>>>>>> Lazy loading will only make things worse here. >>>>>>> >>>>>>> Kenneth >>>>>>> >>>>>>> On 04/10/2016 07:29 AM, Jorg Tiedemann wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have a large language model from the common crawl data set and it >>>>>>> takes forever to load when running moses. >>>>>>> My model is a trigram kenlm binarized with quantization, trie >>>>>> structures >>>>>>> and pointer compression (-a 22 -q 8 -b 8). >>>>>>> The model is about 140GB and it takes hours to load (I’m still >>>>>> waiting). >>>>>>> I run on a machine with 256GB RAM ... >>>>>>> >>>>>>> I also tried lazy loading without success. Is this normal or do I do >>>>>>> something wrong? >>>>>>> Thanks for your help! >>>>>>> >>>>>>> Jörg >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]> >>>>>>> <[email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>>>> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]> >>>>>>> <[email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>>>> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>> >>>>> -- >>>>> Ondrej Bojar (mailto:[email protected] <mailto:[email protected]> / >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> <mailto:[email protected] <mailto:[email protected]>>) >>>>> http://www.cuni.cz/~obo <http://www.cuni.cz/~obo> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] <mailto:[email protected]> >>>> <mailto:[email protected] <mailto:[email protected]>> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >> > _______________________________________________ > Moses-support mailing list > [email protected] <mailto:[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > <http://mailman.mit.edu/mailman/listinfo/moses-support>
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
