Re: [Moses-support] Problem training a portuguese/chinese translator - Part 2

Tom Hoar Tue, 30 Oct 2012 17:45:44 -0700

 

Nelson,


At ~670,000 tokens, your corpus is very small. I would
guess roughly 25-30K segment pairs. Can you confirm? Wilker is also
correct that your tuning set should be about 1,000 pairs for this size
corpus. Anything larger for such a small corpus is robbing Peter to pay
Paul. 

Your training machine is also small, indeed. It looks
over-worked, even with this small corpus. With 2GB RAM @ ~90% usage and
4GB Swap @ 50% (2GB) usage, your machine is spend most of its time
shuffling data to-n-from the hard disk. 10+ days is not unlikely with
your machine under such load. How many mert runs have completed?


Finally, open each runX.moses.ini file in your mert working folder.
You can track the progress of each preceding run with the mert report at
the top of the config file. After 5-6 runs with this small corpus,
you'll likely find that the improvements have leveled off. You can
probably stop the tuning and use the most recent runX.moses.ini config.


Tom 

On 2012-10-31 02:17, Wilker Aziz wrote: 

> Hi Nelson, 
> can
you tell us how many sentences do you have for the following? 
> a)
parallel training set: this is used for phrase extraction (or rule
extraction in hierarchical models), here you want to have as much data
as you can as this is the set that will basically determine how much
bilingual knowledge your model has. 
> b) parallel tuning set: MERT
iteratively optimize the translation model towards maximizing an
evaluation metric (e.g. BLEU) on a held-out parallel data (the tuning
set - which is disjoint to parallel training set), the tuning set has
usually something from 1,000 to 2,000 sentences, if you are using much
more than that your MERT will take way too long and you won't really get
significant gains. 
> Cheers, 
> Wilker. 
> 
> On 29 October 2012 20:31,
Nelson Simao <[email protected] [8]> wrote:
> 
>> Hi,
>> The
chinese corpus 669424 words, and the portuguese 678023 words.
>> In the
terminal is running the 'mert' command.
>> Is using 87% of memory and
half of Swap. Is running on a small server at my college, I think it
have 4Gb of swap an 2Gb of RAM.
>> 
>> I'm going to read that now.
Thanks Philipp! 
>> 
>> 2012/10/29 Philipp Koehn <[email protected]
[5]>
>> 
>>> Hi,
>>> 
>>> how big is your corpus in total (number of
words)?
>>> What step is currently processing?
>>> Is there excessive
memory use / swapping / etc.?
>>> 
>>> There are various ways to speed
things up by multi-threading
>>> or other multi-core usage.
>>>
Check:
>>> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures [1]
>>>

>>> -phi
>>> 
>>> On Mon, Oct 29, 2012 at 12:01 PM, Nelson Simao
<[email protected] [2]> wrote:
>>> > Hi everyone!
>>> >
>>> > Now
I'm having another problem in my translator. I trained it with just
1/4
>>> > of the corpus that I have here, tested it but the translation
results aren't
>>> > so good how I expected. So now I'm trying to train
with the whole
>>> > corpus(cause I think that I will get better
results), but the mert/moses
>>> > commands are running since 21
October...8 days ago.
>>> > Gotta have the translator working properly
as soon as possible, because it
>>> > is part of a college task/work.
Someone can help me with the problem of the
>>> > training duration, and
also give me some tips to get better results in the
>>> > translation of
pt->zn and zn->pt?
>>> >
>>> >
>>> > Best regards!
>>> > Nelson from
Portugal.
>>> > > _______________________________________________
>>> >
Moses-support mailing list
>>> > [email protected] [3]
>>> >
http://mailman.mit.edu/mailman/listinfo/moses-support [4]
>>> >
>> 
>>
_______________________________________________
>> Moses-support mailing
list
>> [email protected] [6]
>>
http://mailman.mit.edu/mailman/listinfo/moses-support [7]
> 
> -- 
> 
>
Wilker Aziz 
> http://pers-www.wlv.ac.uk/~in1676/ [9] 
> PhD candidate
at The Research Group in Computational Linguistics 
> Research Institute
of Information and Language Processing (RIILP) 
> University of
Wolverhampton 
> MB108 
> Stafford Street 
> WOLVERHAMPTON WV1 1LY



Links:
------
[1]
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures
[2]
mailto:[email protected]
[3] mailto:[email protected]
[4]
http://mailman.mit.edu/mailman/listinfo/moses-support
[5]
mailto:[email protected]
[6] mailto:[email protected]
[7]
http://mailman.mit.edu/mailman/listinfo/moses-support
[8]
mailto:[email protected]
[9] http://pers-www.wlv.ac.uk/~in1676/

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem training a portuguese/chinese translator - Part 2

Reply via email to