Achim,
Performance numbers? How about 7 hours vs 7 days? See the thread from last month below "How much Ram for Europarl?". Investing some thought into the hardware design and an additional few hundred dollars for an SSD or two, there is virtually no difference between textual and binarized models. Tom -------- Original Message -------- Subject: Re: [Moses-support] How much Ram for Europarl? Date: Mon, 18 Apr 2011 18:01:17 +0200 From: To: , Cc: [email protected] Hello, Building the phrase table really used to take me a long, long time. I have a 4-processor computer with 8 GB RAM and with a 12 million segment corpus (about 0.5 billion words EN+PT), the whole training took about 7 days, of which 2 days to build the phrase table (using the swap too). However, now I have a 80 GB solid-state drive installed for the swap and temp files and the training of a larger corpus (14 million segments) took about the same time. The main difference was in the building of the phrase table: it took only 7 hours. Beautiful! I hope this information may be useful to you ... although the corpus you want to train is not as large. Maria José -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Tom Hoar Sent: Monday, April 18, 2011 4:05 PM To: David Wilkinson Cc: [email protected] Subject: Re: [Moses-support] How much Ram for Europarl? Your report of 100% physical usage, growing swap usage and low CPU load is normal when working with limited RAM machines. With only 4 Gb Ram and the new (larger) EuroParl v6 corpus, you could train for 3 or 4 days depending on how you setup your swap partition. Even then, it's possible you will run out of RAM before it's finished. Upgrading to 8 Gb ram is a move in the right direction. Once it's finished training, you'll want to use the binarized the tables and language model, which MMM's train-1.11 script creates. Tom On Mon, 18 Apr 2011 14:52:10 +0100, Philipp Koehn wrote: Hi, I am not familiar with the MMM setup, but one of the causes of memory use may be the translation table. You should use the on-disk translation table. -phi On Mon, Apr 18, 2011 at 2:47 PM, David Wilkinson wrote: I have set up an Ubuntu 10.04 system with the moses-for-mere-mortals scripts. The default corpus trained in about 6-7 hours on my system (Athlon x3 3.2Ghz, 4Gb Ram). I am now trying to train the system with the Europarl German-English parallel corpus (about 45m words in each language), again using the default moses-for-mere-mortals settings. The system has been running for 24 hrs and is currently using all the physical memory and about 1.2Gb of swap. None of the cores are being used more than 10%, so like this it will take a very long time to finish. If I double the ram to 8gb, will this be sufficient? Many Thanks David On Tue, 24 May 2011 17:38:48 -0400, "Achim Ruopp" wrote: If I understand correctly I have two options for the phrase and distortion tables: 1. Have textual phrase and distortion tables loaded into memory during decoding - needs lots of memory, but once the tables are loaded is fast because no disk access is needed 2. Binarize the phrase and distortion tables (http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc2 [1]) - only a small index is loaded into memory, and phrases and distortion info is loaded on demand from disk, a bit slower than 1. because disk access is required Is there an option in between 1. and 2. to binarize the tables and load them completely into memory? (requiring less memory than the textual tables, but being fast because of no disk access) Does anybody have performance numbers comparing 1. and 2. (all other settings being equal)? Thanks Achim Links: ------ [1] http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc2
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
