Re: [Moses-support] Working with big models

Marcin Junczys-Dowmunt Tue, 21 Apr 2015 11:04:51 -0700

Hi,

@Marcin, the bigger than usual reordering-table is due to ourallowance for high distortion. 2.4 is after cleaning it up, theoriginal size contains loads of rubbish sentence pairs.

Where do have that distortion?

BTW, the compactization finished at <4hrs. I guess at the 3rd hour iwas starting to doubt whether the server can handle that amount.

The binarization is not that heavy on the server. It just takes a while.As long as there is progress you are fine.

But the phrase size didn't go down as much as i expect, it'sstill 1.1G which might take forever to load when decoding. Will.minphr file be faster to load (it looks binarized, i think) than thenormal .gz phrase table? If not, we're still looking at >18hrs ofloading time on the server.

Try it :) Should not take more than a couple of seconds.


But the reordering went down to from 6.7GB -> 420M.

Weird. I a little bit suspicious of your text tables, as the sizedistributions seem so unusual. But if it works for you, then alright.

What exactly is the process of dealing with models >4GB? The standardmoses tutorial on the "moses rights of passage" and processes would befailing at every instances when considering non-binarized LM,non-compactize phrase-table/lexical-table, non-threadedprocessing/training/decoding.
Is there a guide on dealing with big models? How big can a model growand what is the proportional server clockspeed/RAM necessary?

I have a 128 GB server and I am building and using models from 150 Mparallel sentences, and LMs from hundreds of GB of monolingual text, Iam doing just fine. Unbinarized models are not meant for deployment onany machine whatever size. Treat the text models as intermediaterepresentations, binarized models as final deployment models. You arefine in terms of RAM if your binarized models fit into RAM + a couple ofGB for computations.


Regards,
Liling

On Tue, Apr 21, 2015 at 6:39 PM, liling tan <alvati...@gmail.com<mailto:alvati...@gmail.com>> wrote:


    Dear Moses devs/users,

    *How should one work with big models?*

    Originally, I've 4.5 million parallel sentences and ~13 million
    sentences monolingual data for source and target languages.

    After cleaning with
    
https://github.com/alvations/mosesdecoder/blob/master/scripts/other/gacha_filter.py
    and
    
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/clean-corpus-n.perl,
    I got 2.6 million parallel sentences.


    And after training a phrase-based model with reordering, i get:

        9.9GB of phrase-table.gz
        3.2GB of reordering-table.gz
        ~45GB of language-model.arpa.gz


    With language model, I've binarized it and got to

        ~75GB of language-model.binary

    We ran moses-mert.pl <http://moses-mert.pl> and it completed the
    tuning in 3-4 days on both directions on the dev set (3000
    sentences), after filtering:


        364M phrase-table.gz
        1.8GB reordering-table.gz


    On the test set, we did the filtering too but when decoding it
    took 18 hours to load only 50% of the phrase table:

        1.5GB phrase-table.gz
        6.7GB reordering-table.gz


    So we decided to compactize the phrase table.

    With the phrase-table and reordering, we used the
    processPhraseTableMin and processLexicalTableMin and I'm still
    waiting to get the minimized phrasetable table. It has been
    running for 3 hours on 10 threads each on a 2.5GHz cores.

    *Anyone have any rough idea how small the phrase table and lexical
    table would get?*
    *
    *
    *With that kind of model, how much RAM would be necessary? And how
    long would it take to load the model onto the RAM?

    Any other tips/hints on working with big models efficiently? *

    *Is it even possible for us to use models at such a size on our
    small server (24 cores, 2.5GHz, 128RAM)? If not, how big should
    our sever get?*

    Regards,
    Liling




_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Working with big models

Reply via email to