Phi, I looked at the tuning/tmp.* directory, and no new files were produced, the last the date was from 24 october, so I stopped, and started again the process. What's memory-mapped kenlm and on-disk translation tables?
Hi Wilker! Sentences? I just know the words, so I have to get a way to count how many sentences... And the set I'm using in training, is the same at tuning, the 1/4 of my parallel corpus. 2012/10/30 Wilker Aziz <[email protected]> > Hi Nelson, > > can you tell us how many sentences do you have for the following? > > a) parallel training set: this is used for phrase extraction (or rule > extraction in hierarchical models), here you want to have as much data as > you can as this is the set that will basically determine how much bilingual > knowledge your model has. > > b) parallel tuning set: MERT iteratively optimize the translation model > towards maximizing an evaluation metric (e.g. BLEU) on a held-out parallel > data (the tuning set - which is disjoint to parallel training set), the > tuning set has usually something from 1,000 to 2,000 sentences, if you are > using much more than that your MERT will take way too long and you won't > really get significant gains. > > Cheers, > > Wilker. > > > > > > > On 29 October 2012 20:31, Nelson Simao <[email protected]> wrote: > >> Hi, >> The chinese corpus 669424 words, and the portuguese 678023 words. >> In the terminal is running the 'mert' command. >> Is using 87% of memory and half of Swap. Is running on a small server at >> my college, I think it have 4Gb of swap an 2Gb of RAM. >> >> I'm going to read that now. Thanks Philipp! >> >> >> >> >> 2012/10/29 Philipp Koehn <[email protected]> >> >>> Hi, >>> >>> how big is your corpus in total (number of words)? >>> What step is currently processing? >>> Is there excessive memory use / swapping / etc.? >>> >>> There are various ways to speed things up by multi-threading >>> or other multi-core usage. >>> Check: >>> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures >>> >>> -phi >>> >>> On Mon, Oct 29, 2012 at 12:01 PM, Nelson Simao <[email protected]> >>> wrote: >>> > Hi everyone! >>> > >>> > Now I'm having another problem in my translator. I trained it with >>> just 1/4 >>> > of the corpus that I have here, tested it but the translation results >>> aren't >>> > so good how I expected. So now I'm trying to train with the whole >>> > corpus(cause I think that I will get better results), but the >>> mert/moses >>> > commands are running since 21 October...8 days ago. >>> > Gotta have the translator working properly as soon as possible, >>> because it >>> > is part of a college task/work. Someone can help me with the problem >>> of the >>> > training duration, and also give me some tips to get better results in >>> the >>> > translation of pt->zn and zn->pt? >>> > >>> > >>> > Best regards! >>> > Nelson from Portugal. >>> > >>> > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > -- > Wilker Aziz > http://pers-www.wlv.ac.uk/~in1676/ > > PhD candidate at The Research Group in Computational Linguistics > Research Institute of Information and Language Processing (RIILP) > University of Wolverhampton > MB108 > Stafford Street > WOLVERHAMPTON WV1 1LY > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
