Hi,
2-4M sentences is not that big :)
As for the compact phrase table, the binarized version will be roughly
half the size of your gzipped text phrase-table, the lexical table
should be smaller. However, how come your gzipped reordering-table is
bigger than your phrase-table, that's unusual?
Also, 128 GB of RAM is plenty.
Best,
Marcin
W dniu 21.04.2015 o 18:39, liling tan pisze:
Dear Moses devs/users,
*How should one work with big models?*
Originally, I've 4.5 million parallel sentences and ~13 million
sentences monolingual data for source and target languages.
After cleaning with
https://github.com/alvations/mosesdecoder/blob/master/scripts/other/gacha_filter.py
and
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/training/clean-corpus-n.perl,
I got 2.6 million parallel sentences.
And after training a phrase-based model with reordering, i get:
9.9GB of phrase-table.gz
3.2GB of reordering-table.gz
~45GB of language-model.arpa.gz
With language model, I've binarized it and got to
~75GB of language-model.binary
We ran moses-mert.pl <http://moses-mert.pl> and it completed the
tuning in 3-4 days on both directions on the dev set (3000 sentences),
after filtering:
364M phrase-table.gz
1.8GB reordering-table.gz
On the test set, we did the filtering too but when decoding it took 18
hours to load only 50% of the phrase table:
1.5GB phrase-table.gz
6.7GB reordering-table.gz
So we decided to compactize the phrase table.
With the phrase-table and reordering, we used the
processPhraseTableMin and processLexicalTableMin and I'm still waiting
to get the minimized phrasetable table. It has been running for 3
hours on 10 threads each on a 2.5GHz cores.
*Anyone have any rough idea how small the phrase table and lexical
table would get?*
*
*
*With that kind of model, how much RAM would be necessary? And how
long would it take to load the model onto the RAM?
Any other tips/hints on working with big models efficiently? *
*Is it even possible for us to use models at such a size on our small
server (24 cores, 2.5GHz, 128RAM)? If not, how big should our sever get?*
Regards,
Liling
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support