Hello All,

I'd like some clarification regarding the topic discussed in this thread,
well at least something close to this.
My doubt is a 2-part question, so here goes:

 1. If one was to deal with a translation task using noisy data, could the
fact that MERT did not improve the 'performance of the translation model' be
attributed to the fact that the noise in the development and test/evaluation
sets are of two different kinds.
Let me elaborate: the task could be something like the Haitian translation
task held in WMT this year, or the task of text normalization (converting
sms/chat text into 'normal' english), which I am interested in at the
moment.
            Now, say that the development set contains noise in the form of
'vowel deletion' (e.g. mrkt for market and the like), and the
test/evaluation set is characterized by difference in the ordering of the
words in a sentence (e.g "did the homework ask her" for "ask her if she has
done the homework" ). Is it weird that the model optimized on this
development set gives worse performance on the evaluation set when compared
to the unoptimized model (model before performing MERT).
Please note that I am at the moment interested in the difference between the
characteristics of the development and test/evaluation sets alone.

If the case presented above is not weird:   is it the case that differences
in noise between any two pairs of datasets (train-development,
development-test, train-test) leads to poor translations ?

2.  Is it possible for one to characterize the 'noise' as the difference in
the genre/domain ?
I remember that the my scores on the 'news dataset' dropped when MERT was
performed using another similar new corpus on a model trained using the
 entire Europarl corpus. This was a long time back, and back then I
attributed this to my mistake in the way I conducted the experiment. But
this discussion has made me wonder if the problem I encountered back then
was in fact genuine ?

- Prasanth

On Tue, Feb 22, 2011 at 12:13 AM, Jia Xu <[email protected]> wrote:

> Hi Tom and Suzy,
>
> Thanks a lot for your answers and tips.
> I checked the preprocessing, the tuning set and training set are
> consistent, both truecased and tokenized.
> The training data contains more than 40 million tokens, and the genre
>  looks fine, because the translation output is reasonable without tuning..
>
> Best,
> Jia
>
>
>
> --- Tom Hoar <[email protected]> schrieb am Mo,
> 21.2.2011:
>
> > Von: Tom Hoar <[email protected]>
> > Betreff: Re: [Moses-support] does mert usually enhance BLEU on a test
> set?
> > An: [email protected]
> > CC: "Jia Xu" <[email protected]>, "Suzy Howlett" <[email protected]>
> > Datum: Montag, 21. Februar, 2011 05:37 Uhr
> > Jia,
> >
> > Yes, mert's purpose is to optimize the configuration
> > weights such that BLEU scores increase.
> >
> > I had a similar case where mert didn't change the BLEU
> > scores. Our troubleshooting found the tuning set wasn't
> > prepared the same as the training data... i.e. we forgot to
> > lower-case and tokenize the tuning set. This is probably a
> > good place for you to start.
> >
> > Tom
> >
> >
> > On Mon, 21 Feb 2011 09:35:41 +1100, Suzy Howlett <[email protected]>
> > wrote:
> > > Hi Jia,
> > >
> > > It could very well be that the training data isn't
> > very good. Tuning
> > > changes how much each feature is weighted, but if the
> > estimates of the
> > > feature values aren't reasonable in the first place, I
> > can't imagine it
> > > helps too much. Perhaps you're not using enough
> > training data, or the
> > > training data is just too different from your test
> > data (e.g. genre)?
> > > Someone with more experience than me may be able to
> > give you more advice.
> > >
> > > Best,
> > > Suzy
> > >
> > > On 21/02/11 2:46 AM, Jia Xu wrote:
> > >> Hi,
> > >>
> > >> In my experiments, tuning with mert-moses.pl or
> > mert-moses-new.pl on a development set did not improve the
> > translation quality on a test set, about half percent worse
> > in the BLEU score (no tuning vs. tuning). Does anyone have a
> > similar experience or did I call anything wrong?
> > >>
> > >> nbest=100
> > >> dev: wmt-test08
> > >> test: wmt-test10
> > >> with/without tuning is achieved by turning off/on
> > weight-config in the config file.
> > >>
> > >> Thank you!
> > >> Best Wishes,
> > >> Jia
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> Moses-support mailing list
> > >> [email protected]
> > >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
"Theories have four stages of acceptance. i) this is worthless nonsense; ii)
this is an interesting, but perverse, point of view, iii) this is true, but
quite unimportant; iv) I always said so."

  --- J.B.S. Haldane
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to