I faced a similar issue when I used GIZA++ for aligning English to Hindi
corpora. I would suggest  to append the tokens of the dictionary in the
training corpora. Appending multiple times would further increase the score.
This is because the alignment score would increase as the pair <E,H> appears
more number of times in the corpora.

>From my experiments, I found that by appending the dictionary to the corpora
7 times reduced the AER by > ~10 points. and further appending again
increased the AER.

--
Murali Krishna Emani
IIIT Bangalore.



On Fri, Feb 29, 2008 at 7:36 PM, Chris Callison-Burch <[EMAIL PROTECTED]>
wrote:

> They way that Giza's built-in dictionary option works is to take the
> dictionary into account *only* during the first round of Model 1
> training, with the idea that it will be used for better initialization
> in EM.   I believe that it's generally more effective to simply append
> the dictionary to the end of your parallel corpus (perhaps multiple
> times like Philipp suggests), so that it influences EM at every
> iteration for every Model.
>
> --Chris
>
>
> On Feb 29, 2008, at 7:13 AM, Philipp Koehn wrote:
>
> > Hi,
> >
> > I do not know how the GIZA++ option of using a dictionary works,
> > I have never tried it. But a very common way of using a dictionary
> > is to include it as additional parallel training data. You could even
> > weight it more strongly by adding it multiple times (or fiddle with
> > the sentence count value in the .snt file, as you mention).
> >
> > -phi
> >
> > On Thu, Feb 28, 2008 at 2:45 AM, aditya sarpotdar
> > <[EMAIL PROTECTED]> wrote:
> >> I am trying to use dictionary for alignment using GIZA++.
> >> I have observed that even thought I ask GIZA to do so (using a
> >> command
> >> line), the output does not change.
> >> The model parameters without using the dictionary and after using the
> >> dictionary are same.
> >> When I looked into the source code, I found a flag indicating GIZA
> >> to use
> >> dictionary which is NEVER set to TRUE.
> >> When I changed the code to set the flag, I found GIZA using the
> >> dictionary
> >> in the exact opposite way. The score of the words which are in the
> >> dictionary went lower!
> >> Did anyone face a similar problem?
> >> Am I supposed to set occurance of sentence in .snt file (first line
> >> in every
> >> sentence pair) to -1?
> >> I appreciate if someone could shed some light on this.
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
Murali Krishna EMANI
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to