Re: [Moses-support] Deploying large models

liling tan Wed, 13 Dec 2017 17:03:19 -0800

I don't have a comparison between moses vs moses2. I'll give some moses
numbers once the full dataset is decoded. And I can repeat the decoding for
moses on the same machine.


BTW, the ProbingPT directory created by binarize4moses2.pl , could it be
used for old Moses?
Or would I have to use re-prune the phrase-table and then use
the PhraseDictionaryMemory and  LexicalReordering separatedly?

But I'm getting 4M sentence, 86M words for 14 hours on moses2 for -threads
50 for 56 cores.


#########################
### MOSES CONFIG FILE ###
#########################

# input factors
[input-factors]
0

# mapping steps
[mapping]
0 T 0

[distortion-limit]
6

# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
#PhraseDictionaryMemory name=TranslationModel0 num-features=4
path=/home/ltan/momo/phrase-table.gz input-factor=0 output-factor=0
ProbingPT name=TranslationModel0 num-features=4
path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0
#LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz
LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0
property-index=0

Distortion
KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm order=5

On Thu, Dec 14, 2017 at 3:52 AM, Hieu Hoang <[email protected]> wrote:

> do up have comparison figures for moses v moses2? I never managed to get
> reliable info for more than 32 cores
>
> config/moses.ini files would be good too
>
> Hieu Hoang
> http://moses-smt.org/
>
>
> On 13 December 2017 at 06:10, liling tan <[email protected]> wrote:
>
>> Ah, that's why the phrase-table is exploding... I've never decoded more
>> than 100K sentences before =)
>>
>> binarize4moses2.perl is awesome! Let me see how much speed up I get with
>> Moses2 and pruned tables.
>>
>> Thank you Hieu and Barry!
>>
>>
>>
>>
>> On Tue, Dec 12, 2017 at 6:38 PM, Hieu Hoang <[email protected]> wrote:
>>
>>> Barry is correct, having 750,000 translations for '.' severely degrades
>>> speed.
>>>
>>> I had forgotten about the script I created:
>>>    scripts/generic/binarize4moses2.perl
>>> which takes in the phrase table & lex reordering model, and prunes them
>>> and runs addLexROtoPT. Basically, everything you need to do to create a
>>> fast model for Moses2
>>>
>>> Hieu Hoang
>>> http://moses-smt.org/
>>>
>>>
>>> On 12 December 2017 at 09:16, Barry Haddow <[email protected]>
>>> wrote:
>>>
>>>> Hi Liling
>>>>
>>>> The short answer is you need need to prune/filter your phrase table
>>>> prior to creating the compact phrase table. I don't mean "filter model
>>>> given input", because that won't make much difference if you have a very
>>>> large input, I mean getting rid of rare translations which won't be used
>>>> anyway.
>>>>
>>>> The compact phrase does not do pruning, it ends up being done in
>>>> memory, so if you have 750,000 translations of the full-stop in your model
>>>> then they all get loaded into memory, before Moses selects the top 20.
>>>>
>>>> You can use prunePhraseTable from Moses (which bizarrely needs to load
>>>> a phrase table in order to parse the config file, last time I looked). You
>>>> could also apply Johnson / entropic pruning, whatever works for you,
>>>>
>>>> cheers - Barry
>>>>
>>>>
>>>> On 11/12/17 09:20, liling tan wrote:
>>>>
>>>> Dear Moses community/developers,
>>>>
>>>> I have a question on how to handle large models created using moses.
>>>>
>>>> I've a vanilla phrase-based model with
>>>>
>>>>    - PhraseDictionary num-features=4 input-factor=0 output-factor=0
>>>>    - LexicalReordering num-features=6 input-factor=0 output-factor=0
>>>>    - KENLM order=5 factor=0
>>>>
>>>> The size of the model is:
>>>>
>>>>    - compressed phrase table is 5.4GB,
>>>>    - compressed reordering table is 1.9GB and
>>>>    - quantized LM is 600MB
>>>>
>>>>
>>>> I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm
>>>> decoding I use -threads 56 parameter.
>>>>
>>>> It's takes really long to load the table and after loading, it breaks
>>>> inconsistently at different lines when decoding, I notice that the RAM goes
>>>> into swap before it breaks.
>>>>
>>>> I've tried compact phrased table and get a
>>>>
>>>>    - 3.2GB .minphr
>>>>    - 1.5GV .minlexr
>>>>
>>>> And the same kind of random breakage happens when RAM goes into swap
>>>> after loading the phrase-table.
>>>>
>>>> Strangely, it still manage to decode ~500K sentences before it breaks.
>>>>
>>>> Then I've tried with ondisk phrasetable and it's around 37GB
>>>> uncompressed. Using the ondisk PT didn't cause breakage but the decoding
>>>> time is significantly increased, now it can only decode 15K sentences in an
>>>> hour.
>>>>
>>>> The setup is a little different from normal where we have the
>>>> train/dev/test split. Currently, my task is to decode the train set. I've
>>>> tried filtering the table with the trainset with
>>>> filter-model-given-input.pl but the size of the compressed table
>>>> didn't really decrease much.
>>>>
>>>> The entire training set is made up of 5M sentence pairs and it's taking
>>>> 3+ days just to decode ~1.5M sentences with ondisk PT.
>>>>
>>>>
>>>> My questions are:
>>>>
>>>>  - Are there best practices with regards to deploying large Moses
>>>> models?
>>>>  - Why does the 5+GB phrase table take up > 250GB RAM when decoding?
>>>>  - How else should I filter/compress the phrase table?
>>>>  - Is it normal to decode only ~500K sentence a day given the machine
>>>> specs and the model size?
>>>>
>>>> I understand that I could split the train set up into two and train 2
>>>> models then cross-decode but if the training size is 10M sentence pairs,
>>>> we'll face the same issues.
>>>>
>>>> Thank you for reading the long post and thank you in advances for any
>>>> answers, discussions and enlightenment on this issue =)
>>>>
>>>> Regards,
>>>> LIling
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing 
>>>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Deploying large models

Reply via email to