Hi Prasanth

To ensure the best performance on a particular test set, in general it is 
better if your training and development set are similar. This is true in SMT, 
as it is in any other machine learning problem. The problem of trying to 
develop a good translation system when the training or dev sets differ in 
some systematic way from the test set is generally known as 'domain 
adaptation', and is an active research area. So there's no hard and fast 
rules as to what the best way to proceed is, or indeed how to characterise 
these systematic differences.

The idea of 'noise' is somewhat different to domain adaptation. To me, this 
means that some part of the training or dev data is poor quality in some 
sense, perhaps not being well translated or aligned. The techniques for 
dealing with this are different - maybe you could filter out training 
sentences which translate poorly with a simple, first-pass tranlsation 
systems.

To get an idea of what techniques are used for domain adaptation and noise, a 
good place to start would be the WMT shared task system descriptions. They 
will shortly be out for 2011, but all previous years are online in the ACL 
Anthology,

best regards - Barry

On Wednesday 06 July 2011 15:59, Prasanth K wrote:
> Hello All,
>
> I'd like some clarification regarding the topic discussed in this thread,
> well at least something close to this.
> My doubt is a 2-part question, so here goes:
>
>  1. If one was to deal with a translation task using noisy data, could the
> fact that MERT did not improve the 'performance of the translation model'
> be attributed to the fact that the noise in the development and
> test/evaluation sets are of two different kinds.
> Let me elaborate: the task could be something like the Haitian translation
> task held in WMT this year, or the task of text normalization (converting
> sms/chat text into 'normal' english), which I am interested in at the
> moment.
>             Now, say that the development set contains noise in the form of
> 'vowel deletion' (e.g. mrkt for market and the like), and the
> test/evaluation set is characterized by difference in the ordering of the
> words in a sentence (e.g "did the homework ask her" for "ask her if she has
> done the homework" ). Is it weird that the model optimized on this
> development set gives worse performance on the evaluation set when compared
> to the unoptimized model (model before performing MERT).
> Please note that I am at the moment interested in the difference between
> the characteristics of the development and test/evaluation sets alone.
>
> If the case presented above is not weird:   is it the case that differences
> in noise between any two pairs of datasets (train-development,
> development-test, train-test) leads to poor translations ?
>
> 2.  Is it possible for one to characterize the 'noise' as the difference in
> the genre/domain ?
> I remember that the my scores on the 'news dataset' dropped when MERT was
> performed using another similar new corpus on a model trained using the
>  entire Europarl corpus. This was a long time back, and back then I
> attributed this to my mistake in the way I conducted the experiment. But
> this discussion has made me wonder if the problem I encountered back then
> was in fact genuine ?
>
> - Prasanth
>
> On Tue, Feb 22, 2011 at 12:13 AM, Jia Xu <[email protected]> wrote:
> > Hi Tom and Suzy,
> >
> > Thanks a lot for your answers and tips.
> > I checked the preprocessing, the tuning set and training set are
> > consistent, both truecased and tokenized.
> > The training data contains more than 40 million tokens, and the genre
> >  looks fine, because the translation output is reasonable without
> > tuning..
> >
> > Best,
> > Jia
> >
> >
> >
> > --- Tom Hoar <[email protected]> schrieb am Mo,
> >
> > 21.2.2011:
> > > Von: Tom Hoar <[email protected]>
> > > Betreff: Re: [Moses-support] does mert usually enhance BLEU on a test
> >
> > set?
> >
> > > An: [email protected]
> > > CC: "Jia Xu" <[email protected]>, "Suzy Howlett" <[email protected]>
> > > Datum: Montag, 21. Februar, 2011 05:37 Uhr
> > > Jia,
> > >
> > > Yes, mert's purpose is to optimize the configuration
> > > weights such that BLEU scores increase.
> > >
> > > I had a similar case where mert didn't change the BLEU
> > > scores. Our troubleshooting found the tuning set wasn't
> > > prepared the same as the training data... i.e. we forgot to
> > > lower-case and tokenize the tuning set. This is probably a
> > > good place for you to start.
> > >
> > > Tom
> > >
> > >
> > > On Mon, 21 Feb 2011 09:35:41 +1100, Suzy Howlett <[email protected]>
> > >
> > > wrote:
> > > > Hi Jia,
> > > >
> > > > It could very well be that the training data isn't
> > >
> > > very good. Tuning
> > >
> > > > changes how much each feature is weighted, but if the
> > >
> > > estimates of the
> > >
> > > > feature values aren't reasonable in the first place, I
> > >
> > > can't imagine it
> > >
> > > > helps too much. Perhaps you're not using enough
> > >
> > > training data, or the
> > >
> > > > training data is just too different from your test
> > >
> > > data (e.g. genre)?
> > >
> > > > Someone with more experience than me may be able to
> > >
> > > give you more advice.
> > >
> > > > Best,
> > > > Suzy
> > > >
> > > > On 21/02/11 2:46 AM, Jia Xu wrote:
> > > >> Hi,
> > > >>
> > > >> In my experiments, tuning with mert-moses.pl or
> > >
> > > mert-moses-new.pl on a development set did not improve the
> > > translation quality on a test set, about half percent worse
> > > in the BLEU score (no tuning vs. tuning). Does anyone have a
> > > similar experience or did I call anything wrong?
> > >
> > > >> nbest=100
> > > >> dev: wmt-test08
> > > >> test: wmt-test10
> > > >> with/without tuning is achieved by turning off/on
> > >
> > > weight-config in the config file.
> > >
> > > >> Thank you!
> > > >> Best Wishes,
> > > >> Jia
> > > >>
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> Moses-support mailing list
> > > >> [email protected]
> > > >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to