[Moses-support] Deploying large models

liling tan Mon, 11 Dec 2017 01:26:13 -0800

Dear Moses community/developers,

I have a question on how to handle large models created using moses.


I've a vanilla phrase-based model with

   - PhraseDictionary num-features=4 input-factor=0 output-factor=0
   - LexicalReordering num-features=6 input-factor=0 output-factor=0
   - KENLM order=5 factor=0

The size of the model is:

   - compressed phrase table is 5.4GB,
   - compressed reordering table is 1.9GB and
   - quantized LM is 600MB


I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm
decoding I use -threads 56 parameter.

It's takes really long to load the table and after loading, it breaks
inconsistently at different lines when decoding, I notice that the RAM goes
into swap before it breaks.

I've tried compact phrased table and get a

   - 3.2GB .minphr
   - 1.5GV .minlexr

And the same kind of random breakage happens when RAM goes into swap after
loading the phrase-table.

Strangely, it still manage to decode ~500K sentences before it breaks.

Then I've tried with ondisk phrasetable and it's around 37GB uncompressed.
Using the ondisk PT didn't cause breakage but the decoding time is
significantly increased, now it can only decode 15K sentences in an hour.

The setup is a little different from normal where we have the
train/dev/test split. Currently, my task is to decode the train set. I've
tried filtering the table with the trainset with filter-model-given-input.pl
but the size of the compressed table didn't really decrease much.

The entire training set is made up of 5M sentence pairs and it's taking 3+
days just to decode ~1.5M sentences with ondisk PT.


My questions are:

 - Are there best practices with regards to deploying large Moses models?
 - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
 - How else should I filter/compress the phrase table?
 - Is it normal to decode only ~500K sentence a day given the machine specs
and the model size?

I understand that I could split the train set up into two and train 2
models then cross-decode but if the training size is 10M sentence pairs,
we'll face the same issues.

Thank you for reading the long post and thank you in advances for any
answers, discussions and enlightenment on this issue =)

Regards,
LIling

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Deploying large models

Reply via email to