Re: [Moses-support] Deploying large models

Barry Haddow Tue, 12 Dec 2017 01:21:07 -0800

Hi Liling

The short answer is you need need to prune/filter your phrase tableprior to creating the compact phrase table. I don't mean "filter modelgiven input", because that won't make much difference if you have a verylarge input, I mean getting rid of rare translations which won't be usedanyway.

The compact phrase does not do pruning, it ends up being done in memory,so if you have 750,000 translations of the full-stop in your model thenthey all get loaded into memory, before Moses selects the top 20.

You can use prunePhraseTable from Moses (which bizarrely needs to load aphrase table in order to parse the config file, last time I looked). Youcould also apply Johnson / entropic pruning, whatever works for you,


cheers - Barry

On 11/12/17 09:20, liling tan wrote:

Dear Moses community/developers,

I have a question on how to handle large models created using moses.

I've a vanilla phrase-based model with

  * PhraseDictionary num-features=4 input-factor=0 output-factor=0
  * LexicalReordering num-features=6 input-factor=0 output-factor=0
  * KENLM order=5 factor=0

The size of the model is:

  * compressed phrase table is 5.4GB,
  * compressed reordering table is 1.9GB and
  * quantized LM is 600MB
I'm running on a single 56 cores machine with 256GB RAM. Whenever I'mdecoding I use -threads 56 parameter.
It's takes really long to load the table and after loading, it breaksinconsistently at different lines when decoding, I notice that the RAMgoes into swap before it breaks.
I've tried compact phrased table and get a

  * 3.2GB .minphr
  * 1.5GV .minlexr
And the same kind of random breakage happens when RAM goes into swapafter loading the phrase-table.
Strangely, it still manage to decode ~500K sentences before it breaks.
Then I've tried with ondisk phrasetable and it's around 37GBuncompressed. Using the ondisk PT didn't cause breakage but thedecoding time is significantly increased, now it can only decode 15Ksentences in an hour.
The setup is a little different from normal where we have thetrain/dev/test split. Currently, my task is to decode the train set.I've tried filtering the table with the trainset withfilter-model-given-input.pl <http://filter-model-given-input.pl> butthe size of the compressed table didn't really decrease much.
The entire training set is made up of 5M sentence pairs and it'staking 3+ days just to decode ~1.5M sentences with ondisk PT.
My questions are:

 - Are there best practices with regards to deploying large Moses models?
 - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
 - How else should I filter/compress the phrase table?
- Is it normal to decode only ~500K sentence a day given the machinespecs and the model size?
I understand that I could split the train set up into two and train 2models then cross-decode but if the training size is 10M sentencepairs, we'll face the same issues.
Thank you for reading the long post and thank you in advances for anyanswers, discussions and enlightenment on this issue =)
Regards,
LIling


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Deploying large models

Reply via email to