Re: [Moses-support] Phrase-based models: binarized phrase and distortion tables in memory?

Tom Hoar Tue, 24 May 2011 17:04:10 -0700


Achim,


Performance numbers? How about 7 hours vs 7 days? 

See the
thread from last month below "How much Ram for Europarl?". Investing
some thought into the hardware design and an additional few hundred
dollars for an SSD or two, there is virtually no difference between
textual and binarized models. 

Tom 

-------- Original Message
--------
Subject: Re: [Moses-support] How much Ram for Europarl?
Date:
Mon, 18 Apr 2011 18:01:17 +0200
From: 
To: , 
Cc:
[email protected]

Hello,

Building the phrase table really used to
take me a long, long time. 

I have a 4-processor computer with 8 GB RAM
and with a 12 million segment corpus (about 0.5 billion words EN+PT),
the whole training took about 7 days, of which 2 days to build the
phrase table (using the swap too).

However, now I have a 80 GB
solid-state drive installed for the swap and temp files and the training
of a larger corpus (14 million segments) took about the same time. The
main difference was in the building of the phrase table: it took only 7
hours. Beautiful!

I hope this information may be useful to you ...
although the corpus you want to train is not as large.

Maria José


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Tom Hoar
Sent:
Monday, April 18, 2011 4:05 PM
To: David Wilkinson
Cc:
[email protected]
Subject: Re: [Moses-support] How much Ram for
Europarl?

Your report of 100% physical usage, growing swap usage and
low CPU load is normal when working with limited RAM machines. With only
4 Gb Ram and the new (larger) EuroParl v6 corpus, you could train for 3
or 4 days depending on how you setup your swap partition. Even then,
it's possible you will run out of RAM before it's finished. Upgrading to
8 Gb ram is a move in the right direction.

 Once it's finished
training, you'll want to use the binarized the tables and language
model, which MMM's train-1.11 script creates.

 Tom

On Mon, 18 Apr 2011
14:52:10 +0100, Philipp Koehn  wrote:
Hi,

I am not familiar with the
MMM setup, but one of the causes of memory use may be the translation
table. You should use the on-disk translation table.

-phi

On Mon, Apr
18, 2011 at 2:47 PM, David Wilkinson  wrote:
I have set up an Ubuntu
10.04 system with the moses-for-mere-mortals scripts. The default corpus
trained in about 6-7 hours on my system (Athlon x3 3.2Ghz, 4Gb Ram). I
am now trying to train the system with the Europarl German-English
parallel corpus (about 45m words in each language), again using the
default moses-for-mere-mortals settings. The system has been running for
24 hrs and is currently using all the physical memory and about 1.2Gb of
swap. None of the cores are being used more than 10%, so like this it
will take a very long time to finish. If I double the ram to 8gb, will
this be sufficient?
Many Thanks
David

On Tue, 24 May 2011 17:38:48
-0400, "Achim Ruopp"  wrote:  

If I understand correctly I have two
options for the phrase and distortion tables: 

1. Have textual phrase
and distortion tables loaded into memory during decoding - needs lots of
memory, but once the tables are loaded is fast because no disk access is
needed 

2. Binarize the phrase and distortion tables
(http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc2 [1]) - only
a small index is loaded into memory, and phrases and distortion info is
loaded on demand from disk, a bit slower than 1. because disk access is
required 

Is there an option in between 1. and 2. to binarize the
tables and load them completely into memory? (requiring less memory than
the textual tables, but being fast because of no disk access)  

Does
anybody have performance numbers comparing 1. and 2. (all other settings
being equal)? 

Thanks 

Achim  

Links:
------
[1]
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc2

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Phrase-based models: binarized phrase and distortion tables in memory?

Reply via email to