Re: [Moses-support] Deploying large models

Marcin Junczys-Dowmunt Tue, 12 Dec 2017 01:28:28 -0800

Hi,
I think the important part is that Liling actually manages to translate 
several tens of thousands of sentences before that happens. A quick fix 
would be to break your corpus into pieces of 10K sentences each and loop 
over the files. I usually have bad experience with trying to translate 
large batches of text with moses.


Is still trying to load the entire corpus into memory? It used to do that.

W dniu 12.12.2017 o 10:16, Barry Haddow pisze:
> Hi Liling
>
> The short answer is you need need to prune/filter your phrase table 
> prior to creating the compact phrase table. I don't mean "filter model 
> given input", because that won't make much difference if you have a 
> very large input, I mean getting rid of rare translations which won't 
> be used anyway.
>
> The compact phrase does not do pruning, it ends up being done in 
> memory, so if you have 750,000 translations of the full-stop in your 
> model then they all get loaded into memory, before Moses selects the 
> top 20.
>
> You can use prunePhraseTable from Moses (which bizarrely needs to load 
> a phrase table in order to parse the config file, last time I looked). 
> You could also apply Johnson / entropic pruning, whatever works for you,
>
> cheers - Barry
>
> On 11/12/17 09:20, liling tan wrote:
>> Dear Moses community/developers,
>>
>> I have a question on how to handle large models created using moses.
>>
>> I've a vanilla phrase-based model with
>>
>>   * PhraseDictionary num-features=4 input-factor=0 output-factor=0
>>   * LexicalReordering num-features=6 input-factor=0 output-factor=0
>>   * KENLM order=5 factor=0
>>
>> The size of the model is:
>>
>>   * compressed phrase table is 5.4GB,
>>   * compressed reordering table is 1.9GB and
>>   * quantized LM is 600MB
>>
>>
>> I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm 
>> decoding I use -threads 56 parameter.
>>
>> It's takes really long to load the table and after loading, it breaks 
>> inconsistently at different lines when decoding, I notice that the 
>> RAM goes into swap before it breaks.
>>
>> I've tried compact phrased table and get a
>>
>>   * 3.2GB .minphr
>>   * 1.5GV .minlexr
>>
>> And the same kind of random breakage happens when RAM goes into swap 
>> after loading the phrase-table.
>>
>> Strangely, it still manage to decode ~500K sentences before it breaks.
>>
>> Then I've tried with ondisk phrasetable and it's around 37GB 
>> uncompressed. Using the ondisk PT didn't cause breakage but the 
>> decoding time is significantly increased, now it can only decode 15K 
>> sentences in an hour.
>>
>> The setup is a little different from normal where we have the 
>> train/dev/test split. Currently, my task is to decode the train set. 
>> I've tried filtering the table with the trainset with 
>> filter-model-given-input.pl <http://filter-model-given-input.pl> but 
>> the size of the compressed table didn't really decrease much.
>>
>> The entire training set is made up of 5M sentence pairs and it's 
>> taking 3+ days just to decode ~1.5M sentences with ondisk PT.
>>
>>
>> My questions are:
>>
>>  - Are there best practices with regards to deploying large Moses models?
>>  - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
>>  - How else should I filter/compress the phrase table?
>>  - Is it normal to decode only ~500K sentence a day given the machine 
>> specs and the model size?
>>
>> I understand that I could split the train set up into two and train 2 
>> models then cross-decode but if the training size is 10M sentence 
>> pairs, we'll face the same issues.
>>
>> Thank you for reading the long post and thank you in advances for any 
>> answers, discussions and enlightenment on this issue =)
>>
>> Regards,
>> LIling
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Deploying large models

Reply via email to