Hi Liling
The short answer is you need need to prune/filter your phrase table
prior to creating the compact phrase table. I don't mean "filter model
given input", because that won't make much difference if you have a very
large input, I mean getting rid of rare translations which won't be used
anyway.
The compact phrase does not do pruning, it ends up being done in memory,
so if you have 750,000 translations of the full-stop in your model then
they all get loaded into memory, before Moses selects the top 20.
You can use prunePhraseTable from Moses (which bizarrely needs to load a
phrase table in order to parse the config file, last time I looked). You
could also apply Johnson / entropic pruning, whatever works for you,
cheers - Barry
On 11/12/17 09:20, liling tan wrote:
Dear Moses community/developers,
I have a question on how to handle large models created using moses.
I've a vanilla phrase-based model with
* PhraseDictionary num-features=4 input-factor=0 output-factor=0
* LexicalReordering num-features=6 input-factor=0 output-factor=0
* KENLM order=5 factor=0
The size of the model is:
* compressed phrase table is 5.4GB,
* compressed reordering table is 1.9GB and
* quantized LM is 600MB
I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm
decoding I use -threads 56 parameter.
It's takes really long to load the table and after loading, it breaks
inconsistently at different lines when decoding, I notice that the RAM
goes into swap before it breaks.
I've tried compact phrased table and get a
* 3.2GB .minphr
* 1.5GV .minlexr
And the same kind of random breakage happens when RAM goes into swap
after loading the phrase-table.
Strangely, it still manage to decode ~500K sentences before it breaks.
Then I've tried with ondisk phrasetable and it's around 37GB
uncompressed. Using the ondisk PT didn't cause breakage but the
decoding time is significantly increased, now it can only decode 15K
sentences in an hour.
The setup is a little different from normal where we have the
train/dev/test split. Currently, my task is to decode the train set.
I've tried filtering the table with the trainset with
filter-model-given-input.pl <http://filter-model-given-input.pl> but
the size of the compressed table didn't really decrease much.
The entire training set is made up of 5M sentence pairs and it's
taking 3+ days just to decode ~1.5M sentences with ondisk PT.
My questions are:
- Are there best practices with regards to deploying large Moses models?
- Why does the 5+GB phrase table take up > 250GB RAM when decoding?
- How else should I filter/compress the phrase table?
- Is it normal to decode only ~500K sentence a day given the machine
specs and the model size?
I understand that I could split the train set up into two and train 2
models then cross-decode but if the training size is 10M sentence
pairs, we'll face the same issues.
Thank you for reading the long post and thank you in advances for any
answers, discussions and enlightenment on this issue =)
Regards,
LIling
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support