Hi Liling

The short answer is you need need to prune/filter your phrase table prior to creating the compact phrase table. I don't mean "filter model given input", because that won't make much difference if you have a very large input, I mean getting rid of rare translations which won't be used anyway.

The compact phrase does not do pruning, it ends up being done in memory, so if you have 750,000 translations of the full-stop in your model then they all get loaded into memory, before Moses selects the top 20.

You can use prunePhraseTable from Moses (which bizarrely needs to load a phrase table in order to parse the config file, last time I looked). You could also apply Johnson / entropic pruning, whatever works for you,

cheers - Barry

On 11/12/17 09:20, liling tan wrote:
Dear Moses community/developers,

I have a question on how to handle large models created using moses.

I've a vanilla phrase-based model with

  * PhraseDictionary num-features=4 input-factor=0 output-factor=0
  * LexicalReordering num-features=6 input-factor=0 output-factor=0
  * KENLM order=5 factor=0

The size of the model is:

  * compressed phrase table is 5.4GB,
  * compressed reordering table is 1.9GB and
  * quantized LM is 600MB


I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm decoding I use -threads 56 parameter.

It's takes really long to load the table and after loading, it breaks inconsistently at different lines when decoding, I notice that the RAM goes into swap before it breaks.

I've tried compact phrased table and get a

  * 3.2GB .minphr
  * 1.5GV .minlexr

And the same kind of random breakage happens when RAM goes into swap after loading the phrase-table.

Strangely, it still manage to decode ~500K sentences before it breaks.

Then I've tried with ondisk phrasetable and it's around 37GB uncompressed. Using the ondisk PT didn't cause breakage but the decoding time is significantly increased, now it can only decode 15K sentences in an hour.

The setup is a little different from normal where we have the train/dev/test split. Currently, my task is to decode the train set. I've tried filtering the table with the trainset with filter-model-given-input.pl <http://filter-model-given-input.pl> but the size of the compressed table didn't really decrease much.

The entire training set is made up of 5M sentence pairs and it's taking 3+ days just to decode ~1.5M sentences with ondisk PT.


My questions are:

 - Are there best practices with regards to deploying large Moses models?
 - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
 - How else should I filter/compress the phrase table?
 - Is it normal to decode only ~500K sentence a day given the machine specs and the model size?

I understand that I could split the train set up into two and train 2 models then cross-decode but if the training size is 10M sentence pairs, we'll face the same issues.

Thank you for reading the long post and thank you in advances for any answers, discussions and enlightenment on this issue =)

Regards,
LIling


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to