Re: [Moses-support] Problem training a portuguese/chinese translator - Part 2

Nelson Simao Tue, 30 Oct 2012 17:49:44 -0700

Phi, I looked at the tuning/tmp.* directory, and no new files were
produced, the last the date was from 24 october, so I stopped, and started
again the process. What's memory-mapped kenlm and on-disk translation
tables?


Hi Wilker!
Sentences? I just know the words, so I have to get a way to count how many
sentences...
And the set I'm using in training, is the same at tuning, the 1/4 of my
parallel corpus.



2012/10/30 Wilker Aziz <[email protected]>

> Hi Nelson,
>
> can you tell us how many sentences do you have for the following?
>
> a) parallel training set: this is used for phrase extraction (or rule
> extraction in hierarchical models), here you want to have as much data as
> you can as this is the set that will basically determine how much bilingual
> knowledge your model has.
>
> b) parallel tuning set: MERT iteratively optimize the translation model
> towards maximizing an evaluation metric (e.g. BLEU) on a held-out parallel
> data (the tuning set - which is disjoint to parallel training set), the
> tuning set has usually something from 1,000 to 2,000 sentences, if you are
> using much more than that your MERT will take way too long and you won't
> really get significant gains.
>
> Cheers,
>
> Wilker.
>
>
>
>
>
>
> On 29 October 2012 20:31, Nelson Simao <[email protected]> wrote:
>
>> Hi,
>>  The chinese corpus 669424 words, and the portuguese 678023 words.
>>  In the terminal is running the 'mert' command.
>>  Is using 87% of memory and half of Swap. Is running on a small server at
>> my college, I think it have 4Gb of swap an 2Gb of RAM.
>>
>> I'm going to read that now. Thanks Philipp!
>>
>>
>>
>>
>> 2012/10/29 Philipp Koehn <[email protected]>
>>
>>> Hi,
>>>
>>> how big is your corpus in total (number of words)?
>>> What step is currently processing?
>>> Is there excessive memory use / swapping / etc.?
>>>
>>> There are various ways to speed things up by multi-threading
>>> or other multi-core usage.
>>> Check:
>>> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures
>>>
>>> -phi
>>>
>>> On Mon, Oct 29, 2012 at 12:01 PM, Nelson Simao <[email protected]>
>>> wrote:
>>> > Hi everyone!
>>> >
>>> > Now I'm having another problem in my translator. I trained it with
>>> just 1/4
>>> > of the corpus that I have here, tested it but the translation results
>>> aren't
>>> > so good how I expected. So now I'm trying to train with the whole
>>> > corpus(cause I think that I will get better results), but the
>>> mert/moses
>>> > commands are running since 21 October...8 days ago.
>>> > Gotta have the translator working properly as soon as possible,
>>> because it
>>> > is part of a college task/work. Someone can help me with the problem
>>> of the
>>> > training duration, and also give me some tips to get better results in
>>> the
>>> > translation of pt->zn and zn->pt?
>>> >
>>> >
>>> > Best regards!
>>> > Nelson from Portugal.
>>> >
>>> > _______________________________________________
>>> > Moses-support mailing list
>>> > [email protected]
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
> Wilker Aziz
> http://pers-www.wlv.ac.uk/~in1676/
>
> PhD candidate at The Research Group in Computational Linguistics
> Research Institute of Information and Language Processing (RIILP)
> University of Wolverhampton
> MB108
> Stafford Street
> WOLVERHAMPTON WV1 1LY
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Problem training a portuguese/chinese translator - Part 2

Reply via email to