Re: [Moses-support] Deploying large models

liling tan Thu, 14 Dec 2017 05:55:17 -0800

The Moses1 was using the pruned ProbingPT created by the binarized4moses2.pl
=)


I think the speed up might be non-linear when it compared against the
pruned phrase-table size; the larger the more speedups. But that needs more
rigorous testing to prove ;P


On Thu, Dec 14, 2017 at 7:37 PM, Hieu Hoang <[email protected]> wrote:

> cool, I was expecting only single digits improvements. If the pt in Moses1
> hadn't been pruned, the speedup is a lot to do with the pruning i think
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 14 December 2017 at 07:41, liling tan <[email protected]> wrote:
>
>> With Moses2 and ProbingPT, I got 4M sentence, 86M words for 14 hours on
>> moses2 for -threads 50 for 56 cores. So it's around 6M words per hour for
>> Moses2.
>>
>> With Moses1, ProbingPT and gzipped LO table but with 32K sentences, 280K
>> words per hour for -threads 50 for 56 cores
>>
>> Moses2 is 20x faster than Moses1 for my model!!
>>
>> For Moses1 my moses.ini :
>>
>>
>> #########################
>> ### MOSES CONFIG FILE ###
>> #########################
>>
>> # input factors
>> [input-factors]
>> 0
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>>
>> [distortion-limit]
>> 6
>>
>> # feature functions
>> [feature]
>> UnknownWordPenalty
>> WordPenalty
>> PhrasePenalty
>> #PhraseDictionaryMemory name=TranslationModel0 num-features=4
>> path=/home/ltan/momo/pt.gz input-factor=0 output-factor=0
>> ProbingPT name=TranslationModel0 num-features=4
>> path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0
>> LexicalReordering name=LexicalReordering0 num-features=6
>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>> path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz
>> #LexicalReordering name=LexicalReordering0 num-features=6
>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>> property-index=0
>>
>> Distortion
>> KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm order=5
>>
>>
>>
>> On Thu, Dec 14, 2017 at 8:58 AM, liling tan <[email protected]> wrote:
>>
>>> I don't have a comparison between moses vs moses2. I'll give some moses
>>> numbers once the full dataset is decoded. And I can repeat the decoding for
>>> moses on the same machine.
>>>
>>> BTW, the ProbingPT directory created by binarize4moses2.pl , could it
>>> be used for old Moses?
>>> Or would I have to use re-prune the phrase-table and then use
>>> the PhraseDictionaryMemory and  LexicalReordering separatedly?
>>>
>>> But I'm getting 4M sentence, 86M words for 14 hours on moses2 for
>>> -threads 50 for 56 cores.
>>>
>>>
>>> #########################
>>> ### MOSES CONFIG FILE ###
>>> #########################
>>>
>>> # input factors
>>> [input-factors]
>>> 0
>>>
>>> # mapping steps
>>> [mapping]
>>> 0 T 0
>>>
>>> [distortion-limit]
>>> 6
>>>
>>> # feature functions
>>> [feature]
>>> UnknownWordPenalty
>>> WordPenalty
>>> PhrasePenalty
>>> #PhraseDictionaryMemory name=TranslationModel0 num-features=4
>>> path=/home/ltan/momo/phrase-table.gz input-factor=0 output-factor=0
>>> ProbingPT name=TranslationModel0 num-features=4
>>> path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0
>>> #LexicalReordering name=LexicalReordering0 num-features=6
>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>>> path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz
>>> LexicalReordering name=LexicalReordering0 num-features=6
>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
>>> property-index=0
>>>
>>> Distortion
>>> KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm order=5
>>>
>>>
>>> On Thu, Dec 14, 2017 at 3:52 AM, Hieu Hoang <[email protected]> wrote:
>>>
>>>> do up have comparison figures for moses v moses2? I never managed to
>>>> get reliable info for more than 32 cores
>>>>
>>>> config/moses.ini files would be good too
>>>>
>>>> Hieu Hoang
>>>> http://moses-smt.org/
>>>>
>>>>
>>>> On 13 December 2017 at 06:10, liling tan <[email protected]> wrote:
>>>>
>>>>> Ah, that's why the phrase-table is exploding... I've never decoded
>>>>> more than 100K sentences before =)
>>>>>
>>>>> binarize4moses2.perl is awesome! Let me see how much speed up I get
>>>>> with Moses2 and pruned tables.
>>>>>
>>>>> Thank you Hieu and Barry!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 12, 2017 at 6:38 PM, Hieu Hoang <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Barry is correct, having 750,000 translations for '.' severely
>>>>>> degrades speed.
>>>>>>
>>>>>> I had forgotten about the script I created:
>>>>>>    scripts/generic/binarize4moses2.perl
>>>>>> which takes in the phrase table & lex reordering model, and prunes
>>>>>> them and runs addLexROtoPT. Basically, everything you need to do to 
>>>>>> create
>>>>>> a fast model for Moses2
>>>>>>
>>>>>> Hieu Hoang
>>>>>> http://moses-smt.org/
>>>>>>
>>>>>>
>>>>>> On 12 December 2017 at 09:16, Barry Haddow <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Liling
>>>>>>>
>>>>>>> The short answer is you need need to prune/filter your phrase table
>>>>>>> prior to creating the compact phrase table. I don't mean "filter model
>>>>>>> given input", because that won't make much difference if you have a very
>>>>>>> large input, I mean getting rid of rare translations which won't be used
>>>>>>> anyway.
>>>>>>>
>>>>>>> The compact phrase does not do pruning, it ends up being done in
>>>>>>> memory, so if you have 750,000 translations of the full-stop in your 
>>>>>>> model
>>>>>>> then they all get loaded into memory, before Moses selects the top 20.
>>>>>>>
>>>>>>> You can use prunePhraseTable from Moses (which bizarrely needs to
>>>>>>> load a phrase table in order to parse the config file, last time I 
>>>>>>> looked).
>>>>>>> You could also apply Johnson / entropic pruning, whatever works for you,
>>>>>>>
>>>>>>> cheers - Barry
>>>>>>>
>>>>>>>
>>>>>>> On 11/12/17 09:20, liling tan wrote:
>>>>>>>
>>>>>>> Dear Moses community/developers,
>>>>>>>
>>>>>>> I have a question on how to handle large models created using moses.
>>>>>>>
>>>>>>> I've a vanilla phrase-based model with
>>>>>>>
>>>>>>>    - PhraseDictionary num-features=4 input-factor=0 output-factor=0
>>>>>>>    - LexicalReordering num-features=6 input-factor=0 output-factor=0
>>>>>>>    - KENLM order=5 factor=0
>>>>>>>
>>>>>>> The size of the model is:
>>>>>>>
>>>>>>>    - compressed phrase table is 5.4GB,
>>>>>>>    - compressed reordering table is 1.9GB and
>>>>>>>    - quantized LM is 600MB
>>>>>>>
>>>>>>>
>>>>>>> I'm running on a single 56 cores machine with 256GB RAM. Whenever
>>>>>>> I'm decoding I use -threads 56 parameter.
>>>>>>>
>>>>>>> It's takes really long to load the table and after loading, it
>>>>>>> breaks inconsistently at different lines when decoding, I notice that 
>>>>>>> the
>>>>>>> RAM goes into swap before it breaks.
>>>>>>>
>>>>>>> I've tried compact phrased table and get a
>>>>>>>
>>>>>>>    - 3.2GB .minphr
>>>>>>>    - 1.5GV .minlexr
>>>>>>>
>>>>>>> And the same kind of random breakage happens when RAM goes into swap
>>>>>>> after loading the phrase-table.
>>>>>>>
>>>>>>> Strangely, it still manage to decode ~500K sentences before it
>>>>>>> breaks.
>>>>>>>
>>>>>>> Then I've tried with ondisk phrasetable and it's around 37GB
>>>>>>> uncompressed. Using the ondisk PT didn't cause breakage but the decoding
>>>>>>> time is significantly increased, now it can only decode 15K sentences 
>>>>>>> in an
>>>>>>> hour.
>>>>>>>
>>>>>>> The setup is a little different from normal where we have the
>>>>>>> train/dev/test split. Currently, my task is to decode the train set. 
>>>>>>> I've
>>>>>>> tried filtering the table with the trainset with
>>>>>>> filter-model-given-input.pl but the size of the compressed table
>>>>>>> didn't really decrease much.
>>>>>>>
>>>>>>> The entire training set is made up of 5M sentence pairs and it's
>>>>>>> taking 3+ days just to decode ~1.5M sentences with ondisk PT.
>>>>>>>
>>>>>>>
>>>>>>> My questions are:
>>>>>>>
>>>>>>>  - Are there best practices with regards to deploying large Moses
>>>>>>> models?
>>>>>>>  - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
>>>>>>>  - How else should I filter/compress the phrase table?
>>>>>>>  - Is it normal to decode only ~500K sentence a day given the
>>>>>>> machine specs and the model size?
>>>>>>>
>>>>>>> I understand that I could split the train set up into two and train
>>>>>>> 2 models then cross-decode but if the training size is 10M sentence 
>>>>>>> pairs,
>>>>>>> we'll face the same issues.
>>>>>>>
>>>>>>> Thank you for reading the long post and thank you in advances for
>>>>>>> any answers, discussions and enlightenment on this issue =)
>>>>>>>
>>>>>>> Regards,
>>>>>>> LIling
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing 
>>>>>>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The University of Edinburgh is a charitable body, registered in
>>>>>>> Scotland, with registration number SC005336.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected]
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Deploying large models

Reply via email to