Hi,
2-4M sentences is not that big :)

As for the compact phrase table, the binarized version will be roughly half the size of your gzipped text phrase-table, the lexical table should be smaller. However, how come your gzipped reordering-table is bigger than your phrase-table, that's unusual?

Also, 128 GB of RAM is plenty.

Best,
Marcin

W dniu 21.04.2015 o 18:39, liling tan pisze:
Dear Moses devs/users,

*How should one work with big models?*

Originally, I've 4.5 million parallel sentences and ~13 million sentences monolingual data for source and target languages.

After cleaning with https://github.com/alvations/mosesdecoder/blob/master/scripts/other/gacha_filter.py and https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/clean-corpus-n.perl, I got 2.6 million parallel sentences.


And after training a phrase-based model with reordering, i get:

    9.9GB of phrase-table.gz
    3.2GB of reordering-table.gz
    ~45GB of language-model.arpa.gz


With language model, I've binarized it and got to

    ~75GB of language-model.binary

We ran moses-mert.pl <http://moses-mert.pl> and it completed the tuning in 3-4 days on both directions on the dev set (3000 sentences), after filtering:


    364M phrase-table.gz
    1.8GB reordering-table.gz


On the test set, we did the filtering too but when decoding it took 18 hours to load only 50% of the phrase table:

    1.5GB phrase-table.gz
    6.7GB reordering-table.gz


So we decided to compactize the phrase table.

With the phrase-table and reordering, we used the processPhraseTableMin and processLexicalTableMin and I'm still waiting to get the minimized phrasetable table. It has been running for 3 hours on 10 threads each on a 2.5GHz cores.

*Anyone have any rough idea how small the phrase table and lexical table would get?*
*
*
*With that kind of model, how much RAM would be necessary? And how long would it take to load the model onto the RAM?

Any other tips/hints on working with big models efficiently? *

*Is it even possible for us to use models at such a size on our small server (24 cores, 2.5GHz, 128RAM)? If not, how big should our sever get?*

Regards,
Liling


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to