Barry is correct, having 750,000 translations for '.' severely degrades speed.
I had forgotten about the script I created: scripts/generic/binarize4moses2.perl which takes in the phrase table & lex reordering model, and prunes them and runs addLexROtoPT. Basically, everything you need to do to create a fast model for Moses2 Hieu Hoang http://moses-smt.org/ On 12 December 2017 at 09:16, Barry Haddow <[email protected]> wrote: > Hi Liling > > The short answer is you need need to prune/filter your phrase table prior > to creating the compact phrase table. I don't mean "filter model given > input", because that won't make much difference if you have a very large > input, I mean getting rid of rare translations which won't be used anyway. > > The compact phrase does not do pruning, it ends up being done in memory, > so if you have 750,000 translations of the full-stop in your model then > they all get loaded into memory, before Moses selects the top 20. > > You can use prunePhraseTable from Moses (which bizarrely needs to load a > phrase table in order to parse the config file, last time I looked). You > could also apply Johnson / entropic pruning, whatever works for you, > > cheers - Barry > > > On 11/12/17 09:20, liling tan wrote: > > Dear Moses community/developers, > > I have a question on how to handle large models created using moses. > > I've a vanilla phrase-based model with > > - PhraseDictionary num-features=4 input-factor=0 output-factor=0 > - LexicalReordering num-features=6 input-factor=0 output-factor=0 > - KENLM order=5 factor=0 > > The size of the model is: > > - compressed phrase table is 5.4GB, > - compressed reordering table is 1.9GB and > - quantized LM is 600MB > > > I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm > decoding I use -threads 56 parameter. > > It's takes really long to load the table and after loading, it breaks > inconsistently at different lines when decoding, I notice that the RAM goes > into swap before it breaks. > > I've tried compact phrased table and get a > > - 3.2GB .minphr > - 1.5GV .minlexr > > And the same kind of random breakage happens when RAM goes into swap after > loading the phrase-table. > > Strangely, it still manage to decode ~500K sentences before it breaks. > > Then I've tried with ondisk phrasetable and it's around 37GB uncompressed. > Using the ondisk PT didn't cause breakage but the decoding time is > significantly increased, now it can only decode 15K sentences in an hour. > > The setup is a little different from normal where we have the > train/dev/test split. Currently, my task is to decode the train set. I've > tried filtering the table with the trainset with > filter-model-given-input.pl but the size of the compressed table didn't > really decrease much. > > The entire training set is made up of 5M sentence pairs and it's taking 3+ > days just to decode ~1.5M sentences with ondisk PT. > > > My questions are: > > - Are there best practices with regards to deploying large Moses models? > - Why does the 5+GB phrase table take up > 250GB RAM when decoding? > - How else should I filter/compress the phrase table? > - Is it normal to decode only ~500K sentence a day given the machine > specs and the model size? > > I understand that I could split the train set up into two and train 2 > models then cross-decode but if the training size is 10M sentence pairs, > we'll face the same issues. > > Thank you for reading the long post and thank you in advances for any > answers, discussions and enlightenment on this issue =) > > Regards, > LIling > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
