Hi, I think the important part is that Liling actually manages to translate several tens of thousands of sentences before that happens. A quick fix would be to break your corpus into pieces of 10K sentences each and loop over the files. I usually have bad experience with trying to translate large batches of text with moses.
Is still trying to load the entire corpus into memory? It used to do that. W dniu 12.12.2017 o 10:16, Barry Haddow pisze: > Hi Liling > > The short answer is you need need to prune/filter your phrase table > prior to creating the compact phrase table. I don't mean "filter model > given input", because that won't make much difference if you have a > very large input, I mean getting rid of rare translations which won't > be used anyway. > > The compact phrase does not do pruning, it ends up being done in > memory, so if you have 750,000 translations of the full-stop in your > model then they all get loaded into memory, before Moses selects the > top 20. > > You can use prunePhraseTable from Moses (which bizarrely needs to load > a phrase table in order to parse the config file, last time I looked). > You could also apply Johnson / entropic pruning, whatever works for you, > > cheers - Barry > > On 11/12/17 09:20, liling tan wrote: >> Dear Moses community/developers, >> >> I have a question on how to handle large models created using moses. >> >> I've a vanilla phrase-based model with >> >> * PhraseDictionary num-features=4 input-factor=0 output-factor=0 >> * LexicalReordering num-features=6 input-factor=0 output-factor=0 >> * KENLM order=5 factor=0 >> >> The size of the model is: >> >> * compressed phrase table is 5.4GB, >> * compressed reordering table is 1.9GB and >> * quantized LM is 600MB >> >> >> I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm >> decoding I use -threads 56 parameter. >> >> It's takes really long to load the table and after loading, it breaks >> inconsistently at different lines when decoding, I notice that the >> RAM goes into swap before it breaks. >> >> I've tried compact phrased table and get a >> >> * 3.2GB .minphr >> * 1.5GV .minlexr >> >> And the same kind of random breakage happens when RAM goes into swap >> after loading the phrase-table. >> >> Strangely, it still manage to decode ~500K sentences before it breaks. >> >> Then I've tried with ondisk phrasetable and it's around 37GB >> uncompressed. Using the ondisk PT didn't cause breakage but the >> decoding time is significantly increased, now it can only decode 15K >> sentences in an hour. >> >> The setup is a little different from normal where we have the >> train/dev/test split. Currently, my task is to decode the train set. >> I've tried filtering the table with the trainset with >> filter-model-given-input.pl <http://filter-model-given-input.pl> but >> the size of the compressed table didn't really decrease much. >> >> The entire training set is made up of 5M sentence pairs and it's >> taking 3+ days just to decode ~1.5M sentences with ondisk PT. >> >> >> My questions are: >> >> - Are there best practices with regards to deploying large Moses models? >> - Why does the 5+GB phrase table take up > 250GB RAM when decoding? >> - How else should I filter/compress the phrase table? >> - Is it normal to decode only ~500K sentence a day given the machine >> specs and the model size? >> >> I understand that I could split the train set up into two and train 2 >> models then cross-decode but if the training size is 10M sentence >> pairs, we'll face the same issues. >> >> Thank you for reading the long post and thank you in advances for any >> answers, discussions and enlightenment on this issue =) >> >> Regards, >> LIling >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
