you're right, it's highly non-linear and depends on lots of parameters.
Nice to know it's so much faster in this case
On 14/12/17 13:50, liling tan wrote:
The Moses1 was using the pruned ProbingPT created by the
binarized4moses2.pl <http://binarized4moses2.pl> =)
I think the speed up might be non-linear when it compared against the
pruned phrase-table size; the larger the more speedups. But that needs
more rigorous testing to prove ;P
On Thu, Dec 14, 2017 at 7:37 PM, Hieu Hoang <[email protected]
<mailto:[email protected]>> wrote:
cool, I was expecting only single digits improvements. If the pt
in Moses1 hadn't been pruned, the speedup is a lot to do with the
pruning i think
Hieu Hoang
http://moses-smt.org/
On 14 December 2017 at 07:41, liling tan <[email protected]
<mailto:[email protected]>> wrote:
With Moses2 and ProbingPT, I got 4M sentence, 86M words for 14
hours on moses2 for -threads 50 for 56 cores. So it's around
6M words per hour for Moses2.
With Moses1, ProbingPT and gzipped LO table but with 32K
sentences, 280K words per hour for -threads 50 for 56 cores
Moses2 is 20x faster than Moses1 for my model!!
For Moses1 my moses.ini :
#########################
### MOSES CONFIG FILE ###
#########################
# input factors
[input-factors]
0
# mapping steps
[mapping]
0 T 0
[distortion-limit]
6
# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
#PhraseDictionaryMemory name=TranslationModel0 num-features=4
path=/home/ltan/momo/pt.gz input-factor=0 output-factor=0
ProbingPT name=TranslationModel0 num-features=4
path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0
LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0
output-factor=0
path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz
#LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0
output-factor=0 property-index=0
Distortion
KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm order=5
On Thu, Dec 14, 2017 at 8:58 AM, liling tan
<[email protected] <mailto:[email protected]>> wrote:
I don't have a comparison between moses vs moses2. I'll
give some moses numbers once the full dataset is decoded.
And I can repeat the decoding for moses on the same machine.
BTW, the ProbingPT directory created by binarize4moses2.pl
<http://binarize4moses2.pl> , could it be used for old Moses?
Or would I have to use re-prune the phrase-table and then
use the PhraseDictionaryMemory and LexicalReordering
separatedly?
But I'm getting 4M sentence, 86M words for 14 hours on
moses2 for -threads 50 for 56 cores.
#########################
### MOSES CONFIG FILE ###
#########################
# input factors
[input-factors]
0
# mapping steps
[mapping]
0 T 0
[distortion-limit]
6
# feature functions
[feature]
UnknownWordPenalty
WordPenalty
PhrasePenalty
#PhraseDictionaryMemory name=TranslationModel0
num-features=4 path=/home/ltan/momo/phrase-table.gz
input-factor=0 output-factor=0
ProbingPT name=TranslationModel0 num-features=4
path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0
#LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0
output-factor=0
path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz
LexicalReordering name=LexicalReordering0 num-features=6
type=wbe-msd-bidirectional-fe-allff input-factor=0
output-factor=0 property-index=0
Distortion
KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm
order=5
On Thu, Dec 14, 2017 at 3:52 AM, Hieu Hoang
<[email protected] <mailto:[email protected]>> wrote:
do up have comparison figures for moses v moses2? I
never managed to get reliable info for more than 32 cores
config/moses.ini files would be good too
Hieu Hoang
http://moses-smt.org/
On 13 December 2017 at 06:10, liling tan
<[email protected] <mailto:[email protected]>> wrote:
Ah, that's why the phrase-table is exploding...
I've never decoded more than 100K sentences before =)
binarize4moses2.perl is awesome! Let me see how
much speed up I get with Moses2 and pruned tables.
Thank you Hieu and Barry!
On Tue, Dec 12, 2017 at 6:38 PM, Hieu Hoang
<[email protected] <mailto:[email protected]>>
wrote:
Barry is correct, having 750,000 translations
for '.' severely degrades speed.
I had forgotten about the script I created:
scripts/generic/binarize4moses2.perl
which takes in the phrase table & lex
reordering model, and prunes them and runs
addLexROtoPT. Basically, everything you need
to do to create a fast model for Moses2
Hieu Hoang
http://moses-smt.org/
On 12 December 2017 at 09:16, Barry Haddow
<[email protected]
<mailto:[email protected]>> wrote:
Hi Liling
The short answer is you need need to
prune/filter your phrase table prior to
creating the compact phrase table. I don't
mean "filter model given input", because
that won't make much difference if you
have a very large input, I mean getting
rid of rare translations which won't be
used anyway.
The compact phrase does not do pruning, it
ends up being done in memory, so if you
have 750,000 translations of the full-stop
in your model then they all get loaded
into memory, before Moses selects the top 20.
You can use prunePhraseTable from Moses
(which bizarrely needs to load a phrase
table in order to parse the config file,
last time I looked). You could also apply
Johnson / entropic pruning, whatever works
for you,
cheers - Barry
On 11/12/17 09:20, liling tan wrote:
Dear Moses community/developers,
I have a question on how to handle large
models created using moses.
I've a vanilla phrase-based model with
* PhraseDictionary num-features=4 input-factor=0
output-factor=0
* LexicalReordering num-features=6 input-factor=0
output-factor=0
* KENLM order=5 factor=0
The size of the model is:
* compressed phrase table is 5.4GB,
* compressed reordering table is 1.9GB and
* quantized LM is 600MB
I'm running on a single 56 cores machine
with 256GB RAM. Whenever I'm decoding I
use -threads 56 parameter.
It's takes really long to load the table
and after loading, it breaks
inconsistently at different lines when
decoding, I notice that the RAM goes into
swap before it breaks.
I've tried compact phrased table and get a
* 3.2GB .minphr
* 1.5GV .minlexr
And the same kind of random breakage
happens when RAM goes into swap after
loading the phrase-table.
Strangely, it still manage to decode
~500K sentences before it breaks.
Then I've tried with ondisk phrasetable
and it's around 37GB uncompressed. Using
the ondisk PT didn't cause breakage but
the decoding time is significantly
increased, now it can only decode 15K
sentences in an hour.
The setup is a little different from
normal where we have the train/dev/test
split. Currently, my task is to decode
the train set. I've tried filtering the
table with the trainset with
filter-model-given-input.pl
<http://filter-model-given-input.pl> but
the size of the compressed table didn't
really decrease much.
The entire training set is made up of 5M
sentence pairs and it's taking 3+ days
just to decode ~1.5M sentences with
ondisk PT.
My questions are:
- Are there best practices with regards
to deploying large Moses models?
- Why does the 5+GB phrase table take up
> 250GB RAM when decoding?
- How else should I filter/compress the
phrase table?
- Is it normal to decode only ~500K
sentence a day given the machine specs
and the model size?
I understand that I could split the train
set up into two and train 2 models then
cross-decode but if the training size is
10M sentence pairs, we'll face the same
issues.
Thank you for reading the long post and
thank you in advances for any answers,
discussions and enlightenment on this
issue =)
Regards,
LIling
_______________________________________________
Moses-support mailing list
[email protected]
<mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
<http://mailman.mit.edu/mailman/listinfo/moses-support>
The University of Edinburgh is a
charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
<mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
<http://mailman.mit.edu/mailman/listinfo/moses-support>
--
Hieu Hoang
http://moses-smt.org/
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support