Barry is correct, having 750,000 translations for '.' severely degrades
speed.

I had forgotten about the script I created:
   scripts/generic/binarize4moses2.perl
which takes in the phrase table & lex reordering model, and prunes them and
runs addLexROtoPT. Basically, everything you need to do to create a fast
model for Moses2

Hieu Hoang
http://moses-smt.org/


On 12 December 2017 at 09:16, Barry Haddow <[email protected]>
wrote:

> Hi Liling
>
> The short answer is you need need to prune/filter your phrase table prior
> to creating the compact phrase table. I don't mean "filter model given
> input", because that won't make much difference if you have a very large
> input, I mean getting rid of rare translations which won't be used anyway.
>
> The compact phrase does not do pruning, it ends up being done in memory,
> so if you have 750,000 translations of the full-stop in your model then
> they all get loaded into memory, before Moses selects the top 20.
>
> You can use prunePhraseTable from Moses (which bizarrely needs to load a
> phrase table in order to parse the config file, last time I looked). You
> could also apply Johnson / entropic pruning, whatever works for you,
>
> cheers - Barry
>
>
> On 11/12/17 09:20, liling tan wrote:
>
> Dear Moses community/developers,
>
> I have a question on how to handle large models created using moses.
>
> I've a vanilla phrase-based model with
>
>    - PhraseDictionary num-features=4 input-factor=0 output-factor=0
>    - LexicalReordering num-features=6 input-factor=0 output-factor=0
>    - KENLM order=5 factor=0
>
> The size of the model is:
>
>    - compressed phrase table is 5.4GB,
>    - compressed reordering table is 1.9GB and
>    - quantized LM is 600MB
>
>
> I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm
> decoding I use -threads 56 parameter.
>
> It's takes really long to load the table and after loading, it breaks
> inconsistently at different lines when decoding, I notice that the RAM goes
> into swap before it breaks.
>
> I've tried compact phrased table and get a
>
>    - 3.2GB .minphr
>    - 1.5GV .minlexr
>
> And the same kind of random breakage happens when RAM goes into swap after
> loading the phrase-table.
>
> Strangely, it still manage to decode ~500K sentences before it breaks.
>
> Then I've tried with ondisk phrasetable and it's around 37GB uncompressed.
> Using the ondisk PT didn't cause breakage but the decoding time is
> significantly increased, now it can only decode 15K sentences in an hour.
>
> The setup is a little different from normal where we have the
> train/dev/test split. Currently, my task is to decode the train set. I've
> tried filtering the table with the trainset with
> filter-model-given-input.pl but the size of the compressed table didn't
> really decrease much.
>
> The entire training set is made up of 5M sentence pairs and it's taking 3+
> days just to decode ~1.5M sentences with ondisk PT.
>
>
> My questions are:
>
>  - Are there best practices with regards to deploying large Moses models?
>  - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
>  - How else should I filter/compress the phrase table?
>  - Is it normal to decode only ~500K sentence a day given the machine
> specs and the model size?
>
> I understand that I could split the train set up into two and train 2
> models then cross-decode but if the training size is 10M sentence pairs,
> we'll face the same issues.
>
> Thank you for reading the long post and thank you in advances for any
> answers, discussions and enlightenment on this issue =)
>
> Regards,
> LIling
>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to