Hi, back to your question on getting the files on local disks where tuning jobs will run: This was never easy with the current implementation, but in fact, with multithreaded moses, the benefit of parallelizing across nodes is vanishing.
So I'd pass some queue-parameters to force the job to land on one of a very few nodes that will have the files already there. Also, we have all our temps cross-mounted, so what I sometimes do is to let the job run anywhere but take the data from the local temp of another fixed machine. Yes, this is wasting network but relieving the flooded (or incapable) main file server. Cheers, O. ----- Original Message ----- > From: "Jorg Tiedemann" <[email protected]> > To: "Kenneth Heafield" <[email protected]> > Cc: [email protected] > Sent: Tuesday, 12 April, 2016 14:45:57 > Subject: Re: [Moses-support] loading time for large LMs > Well, this is on a shared login node and maybe not very representative for > other > nodes in the cluster. > I can see if I can get a more representative figure. > But it’s quite busy on our cluster right now …. > > > All the best, > Jörg > > > Jörg Tiedemann > [email protected] > > > > > > >> On 12 Apr 2016, at 14:54, Kenneth Heafield <[email protected]> wrote: >> >> Hi, >> >> Why is your system using 7 GB of swap out of 9 GB? Moses is only >> taking 147 GB out of 252 GB physical. I smell other processes taking up >> RAM, possibly those 5 stopped and 1 zombie. >> >> Kenneth >> >> On 04/12/2016 12:45 PM, Jorg Tiedemann wrote: >>> >>>> >>>> Did you remove all "lazyken" arguments from moses.ini? >>> >>> Yes, I did. >>> >>>> >>>> Is the network filesystem Lustre? If so, mmap will perform terribly and >>>> you should use load=read or (better) load=parallel_read since reading >>>> from Lustre is CPU-bound. >>>> >>> >>> Yes, I think so. Interesting with the parallel_read option. Can this >>> hurt for some setups or could I use this as my standard? >>> >>> >>>> Does the cluster management software/job scheduler/sysadmin impose a >>>> resident memory limit? >>>> >>> >>> I don’t really know. I don’t really think so but I need to find out >>> >>> >>>> Can you copy-paste `top' when it's running slow and the stderr at that >>>> time? >>> >>> >>> Here is top of my top when running on my test node: >>> >>> top - 14:39:03 up 50 days, 5:47, 0 users, load average: 1.97, 2.09, 3.85 >>> Tasks: 814 total, 3 running, 805 sleeping, 5 stopped, 1 zombie >>> Cpu(s): 6.9%us, 6.2%sy, 0.0%ni, 86.9%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Mem: 264493500k total, 263614188k used, 879312k free, 68680k buffers >>> Swap: 9775548k total, 7198920k used, 2576628k free, 69531796k cached >>> >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 42528 tiedeman 20 0 147g 147g 800 R 100.0 58.4 31:25.01 moses >>> >>> stderr doesn’t say anything new besides of the message from starting the >>> feature function loading >>> >>> FeatureFunction: LM0 start: 16 end: 16 >>> line=KENLM load=parallel_read name=LM1 factor=0 >>> path=/homeappl/home/tiedeman/research/SMT/wmt16/fi-en/data/monolingual/cc.tok.3.en.trie.kenlm >>> order=3 >>> >>> >>> I try with /tmp/ now as well (it takes time to shuffle around the big >>> files though). >>> >>> Jörg >>> >>> >>>> >>>> On 04/12/2016 08:26 AM, Jorg Tiedemann wrote: >>>>> >>>>> No, it’s definitely not waiting for input … the same setup works for >>>>> smaller models. >>>>> >>>>> I have the models on a work partition on our cluster. >>>>> This is probably not good enough and I will try to move data to local >>>>> tmp on the individual nodes before executing. >>>>> Hopefully this helps. How would you do this if you want to distribute >>>>> tuning? >>>>> >>>>> Thanks! >>>>> Jörg >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On 12 Apr 2016, at 09:34, Ondrej Bojar <[email protected] >>>>>> <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>>> >>>>>> Random suggestion: isn't it waiting for stdin for some strange >>>>>> reason? ;-) >>>>>> >>>>>> O. >>>>>> >>>>>> >>>>>> On April 12, 2016 8:20:46 AM CEST, Hieu Hoang <[email protected] >>>>>> <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>>>> I assume that it's on local disk rather than a network drive. >>>>>>> >>>>>>> Are you sure it's still in the loading stage, and that it's loading >>>>>>> kenlm, >>>>>>> rather than the pt or lexicalized reordering model etc? >>>>>>> >>>>>>> If there's a way to make the model files available for download or to >>>>>>> give >>>>>>> me access your machine, i might be able to debug it >>>>>>> >>>>>>> Hieu Hoang >>>>>>> http://www.hoang.co.uk/hieu <http://www.hoang.co.uk/hieu> >>>>>>> On 12 Apr 2016 08:41, "Jorg Tiedemann" <[email protected] >>>>>>> <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> Unfortunately, load=read didn’t help. It’s been loading for 7 hours >>>>>>> now >>>>>>>> and no sign to start decoding. >>>>>>>> The disk is not terribly slow. cat worked without problem. I don’t >>>>>>> know >>>>>>>> what to do but I think that I have to give up for now. >>>>>>>> Am I the only one who is experiencing such slow loading times? >>>>>>>> >>>>>>>> Thanks again for your help! >>>>>>>> >>>>>>>> Jörg >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 10 Apr 2016, at 22:27, Kenneth Heafield <[email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> With load=read: >>>>>>>> >>>>>>>> Act like normal RAM as part of the Moses process. >>>>>>>> >>>>>>>> Supports huge pages via transparent huge pages, so it's slightly >>>>>>> faster. >>>>>>>> >>>>>>>> Before loading cat file >/dev/null will just put things into cache >>>>>>> that >>>>>>>> were going to be read more or less like cat anyway. >>>>>>>> >>>>>>>> After loading cat file >/dev/null will hurt since there's the >>>>>>> potential >>>>>>>> to load the file into RAM twice and swap out bits of Moses. >>>>>>>> >>>>>>>> Memory is shared between threads, just not with the disk cache (ok >>>>>>>> maybe, but only if they get huge pages support to work well) or other >>>>>>>> processes that independently read the file. >>>>>>>> >>>>>>>> With load=populate: >>>>>>>> >>>>>>>> Load upfront, map it into the process, kernel seems to evict it >>>>>>> first. >>>>>>>> >>>>>>>> Before loading cat file >/dev/null might help, but in theory >>>>>>>> MAP_POPULATE should be doing much the same thing. >>>>>>>> >>>>>>>> After loading or during slow loading cat file >/dev/null can help >>>>>>>> because it forces the data back into RAM. This is particularly >>>>>>> useful >>>>>>>> if the Moses process came under memory pressure after loading, which >>>>>>> can >>>>>>>> include heavy disk activity even if RAM isn't full. >>>>>>>> >>>>>>>> Memory is shared with all other processes that mmap. >>>>>>>> >>>>>>>> With load=lazy: >>>>>>>> >>>>>>>> Map into the process with lazy loading (i.e. mmap without >>>>>>> MAP_POPULATE). >>>>>>>> Not recommended for decoding, but useful if you've got a 6 TB file >>>>>>> and >>>>>>>> want to send it a few 1000 queries. >>>>>>>> >>>>>>>> cat will definitely help here at any time. >>>>>>>> >>>>>>>> Memory is shared with all other processes that mmap. >>>>>>>> >>>>>>>> On 04/10/2016 06:50 PM, Jorg Tiedemann wrote: >>>>>>>> >>>>>>>> Thanks for the quick reply. >>>>>>>> I will try the load option. >>>>>>>> >>>>>>>> Quick question: You said that the memory will not be shared across >>>>>>>> processes with that option. Does that mean that it will load the LM >>>>>>> for >>>>>>>> each thread? That would mean a lot in my setup. >>>>>>>> >>>>>>>> By the way, I also did the cat >/dev/null thing but I didn’t have the >>>>>>>> impression that this changed a lot. Does it really help and how much >>>>>>>> would you usually gain? Thanks again! >>>>>>>> >>>>>>>> >>>>>>>> Jörg >>>>>>>> >>>>>>>> >>>>>>>> On 10 Apr 2016, at 12:55, Kenneth Heafield <[email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]> >>>>>>>> <[email protected] >>>>>>>> <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm assuming you have enough RAM to fit everything. The kernel seems >>>>>>>> to preferentially evict mmapped pages as memory usage approaches full >>>>>>>> (it doesn't have to be full). To work around this, use >>>>>>>> >>>>>>>> load=read >>>>>>>> >>>>>>>> in your moses.ini line for the models. REMOVE any "lazyken" argument >>>>>>>> which is deprecated and might override the load= argument. >>>>>>>> >>>>>>>> The effect of load=read is to malloc (ok, anonymous mmap which is how >>>>>>>> malloc is implemented anyway) at a 1 GB aligned address (to optimize >>>>>>> for >>>>>>>> huge pages) and read() the file into that memory. It will no longer >>>>>>>> share across processes, but memory will have the same swapiness as >>>>>>> the >>>>>>>> rest of the Moses process. >>>>>>>> >>>>>>>> Lazy loading will only make things worse here. >>>>>>>> >>>>>>>> Kenneth >>>>>>>> >>>>>>>> On 04/10/2016 07:29 AM, Jorg Tiedemann wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have a large language model from the common crawl data set and it >>>>>>>> takes forever to load when running moses. >>>>>>>> My model is a trigram kenlm binarized with quantization, trie >>>>>>> structures >>>>>>>> and pointer compression (-a 22 -q 8 -b 8). >>>>>>>> The model is about 140GB and it takes hours to load (I’m still >>>>>>> waiting). >>>>>>>> I run on a machine with 256GB RAM ... >>>>>>>> >>>>>>>> I also tried lazy loading without success. Is this normal or do I do >>>>>>>> something wrong? >>>>>>>> Thanks for your help! >>>>>>>> >>>>>>>> Jörg >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]> >>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]> >>>>>>>> <[email protected] <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>>>> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] <mailto:[email protected]> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] <mailto:[email protected]> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>>> >>>>>> -- >>>>>> Ondrej Bojar (mailto:[email protected] <mailto:[email protected]> / >>>>>> [email protected] >>>>>> <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>>> <mailto:[email protected] <mailto:[email protected]>>) >>>>>> http://www.cuni.cz/~obo <http://www.cuni.cz/~obo> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] <mailto:[email protected]> >>>> <mailto:[email protected] <mailto:[email protected]>> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> <http://mailman.mit.edu/mailman/listinfo/moses-support> >>> >> _______________________________________________ >> Moses-support mailing list >> [email protected] <mailto:[email protected]> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> <http://mailman.mit.edu/mailman/listinfo/moses-support> > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- Ondrej Bojar (mailto:[email protected] / [email protected]) http://www.cuni.cz/~obo _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
