Nelson,
At ~670,000 tokens, your corpus is very small. I would guess roughly 25-30K segment pairs. Can you confirm? Wilker is also correct that your tuning set should be about 1,000 pairs for this size corpus. Anything larger for such a small corpus is robbing Peter to pay Paul. Your training machine is also small, indeed. It looks over-worked, even with this small corpus. With 2GB RAM @ ~90% usage and 4GB Swap @ 50% (2GB) usage, your machine is spend most of its time shuffling data to-n-from the hard disk. 10+ days is not unlikely with your machine under such load. How many mert runs have completed? Finally, open each runX.moses.ini file in your mert working folder. You can track the progress of each preceding run with the mert report at the top of the config file. After 5-6 runs with this small corpus, you'll likely find that the improvements have leveled off. You can probably stop the tuning and use the most recent runX.moses.ini config. Tom On 2012-10-31 02:17, Wilker Aziz wrote: > Hi Nelson, > can you tell us how many sentences do you have for the following? > a) parallel training set: this is used for phrase extraction (or rule extraction in hierarchical models), here you want to have as much data as you can as this is the set that will basically determine how much bilingual knowledge your model has. > b) parallel tuning set: MERT iteratively optimize the translation model towards maximizing an evaluation metric (e.g. BLEU) on a held-out parallel data (the tuning set - which is disjoint to parallel training set), the tuning set has usually something from 1,000 to 2,000 sentences, if you are using much more than that your MERT will take way too long and you won't really get significant gains. > Cheers, > Wilker. > > On 29 October 2012 20:31, Nelson Simao <[email protected] [8]> wrote: > >> Hi, >> The chinese corpus 669424 words, and the portuguese 678023 words. >> In the terminal is running the 'mert' command. >> Is using 87% of memory and half of Swap. Is running on a small server at my college, I think it have 4Gb of swap an 2Gb of RAM. >> >> I'm going to read that now. Thanks Philipp! >> >> 2012/10/29 Philipp Koehn <[email protected] [5]> >> >>> Hi, >>> >>> how big is your corpus in total (number of words)? >>> What step is currently processing? >>> Is there excessive memory use / swapping / etc.? >>> >>> There are various ways to speed things up by multi-threading >>> or other multi-core usage. >>> Check: >>> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures [1] >>> >>> -phi >>> >>> On Mon, Oct 29, 2012 at 12:01 PM, Nelson Simao <[email protected] [2]> wrote: >>> > Hi everyone! >>> > >>> > Now I'm having another problem in my translator. I trained it with just 1/4 >>> > of the corpus that I have here, tested it but the translation results aren't >>> > so good how I expected. So now I'm trying to train with the whole >>> > corpus(cause I think that I will get better results), but the mert/moses >>> > commands are running since 21 October...8 days ago. >>> > Gotta have the translator working properly as soon as possible, because it >>> > is part of a college task/work. Someone can help me with the problem of the >>> > training duration, and also give me some tips to get better results in the >>> > translation of pt->zn and zn->pt? >>> > >>> > >>> > Best regards! >>> > Nelson from Portugal. >>> > > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] [3] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support [4] >>> > >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] [6] >> http://mailman.mit.edu/mailman/listinfo/moses-support [7] > > -- > > Wilker Aziz > http://pers-www.wlv.ac.uk/~in1676/ [9] > PhD candidate at The Research Group in Computational Linguistics > Research Institute of Information and Language Processing (RIILP) > University of Wolverhampton > MB108 > Stafford Street > WOLVERHAMPTON WV1 1LY Links: ------ [1] http://www.statmt.org/moses/?n=Moses.AdvancedFeatures [2] mailto:[email protected] [3] mailto:[email protected] [4] http://mailman.mit.edu/mailman/listinfo/moses-support [5] mailto:[email protected] [6] mailto:[email protected] [7] http://mailman.mit.edu/mailman/listinfo/moses-support [8] mailto:[email protected] [9] http://pers-www.wlv.ac.uk/~in1676/
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
