Yes, that's right. That's a situation as illustrated in Fig. 1b of
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
and a "single word heuristic" as proposed in that paper can be a remedy.
On Thu, 2014-11-27 at 16:51 +0000, Barry Haddow wrote:
> Hi Vera
>
> I think the situation you describe could happen even without unaligned
> words. Suppose that you have a 2 word sentence on each side, and the
> alignment points are (0,0), (0,1) and (1,0) - I think this is possible
> with the usual symmetrisation algorithm. Then you would extract the
> phrase pair containing 2 2-word phrases, but no phrase pairs containing
> 1-word phrases. (see below for an example)
>
> You still get lexical weights for the translation of word-0 to word-0
> though, since there is an alignment point there
>
> cheers - Barry
>
>
> [hyperion]bhaddow: cat c.en
> a b
> [hyperion]bhaddow: cat c.fr
> A B
> [hyperion]bhaddow: cat c.align
> 0-0 1-0 0-1
> [hyperion]bhaddow: ~/moses.new/bin/extract c.en c.fr c.align e 5
> PhraseExtract v1.4, written by Philipp Koehn
> phrase extraction from an aligned parallel corpus
> [hyperion]bhaddow: cat e
> A B ||| a b ||| 0-0 1-0 0-1
> [hyperion]bhaddow: ~/moses.new/scripts/training/get-lexical.perl c.en
> c.fr c.align c
> (c.en,c.fr,c)
> FILE: c.fr
> FILE: c.en
> FILE: c.align
> !
> Saved: c.f2e and c.e2f
> [hyperion]bhaddow: cat c.e2f
> a A 0.5000000
> a B 1.0000000
> b A 0.5000000
> [hyperion]bhaddow: cat c.f2e
> A a 0.5000000
> B a 0.5000000
> A b 1.0000000
>
>
> On 27/11/14 16:15, Matthias Huck wrote:
> > Hi Vera,
> >
> > It's odd that the lexical translation model contains such an entry if
> > the pair is always unaligned. Maybe you used a different word alignment
> > when you extracted the lexicon model?
> >
> > You should manually have a look at your word alignment in order to check
> > whether it has reasonable quality. There's a visualization tool called
> > "Picaro" in Moses:
> >
> > $ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f
> > model/aligned.1.0.de -e model/aligned.1.0.en
> >
> > In order to find out whether the symmetrization heuristic is an issue
> > for you, you can compare the standard and inverse GIZA alignments with
> > the symmetrized alignment.
> >
> > Ways to experiment with word alignment quality are for instance:
> >
> > - Choosing a different symmetrization heuristic
> > - Modifying the GIZA settings, e.g. by training with a different number
> > of EM iterations or a different sequence of IBM/HMM models
> > - Using some other method for training word alignments, e.g. a
> > discriminative word aligner
> >
> > Also, if the amount of parallel training data is small, you shouldn't be
> > surprised if you are not able to train reliable models.
> >
> > Cheers,
> > Matthias
> >
> >
> > On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
> >> Hi,
> >>
> >> I have one more question:
> >> In the lex.e2f file there is a translation Gitarre->guitar:
> >>
> >> Gitarre guitar 0.4000000
> >> Gitarre using 0.0000284
> >> Gitarre ; 0.0000017
> >>
> >> Why has not it became part of the phrase table?
> >>
> >> Thanks again!
> >> Vera
> >>
> >> -----Ursprüngliche Nachricht-----
> >> Von: Vera Aleksic, Linguatec GmbH
> >> Gesendet: Donnerstag, 27. November 2014 09:42
> >> An: 'Matthias Huck'; Raj Dabre
> >> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
> >>
> >> Hi,
> >> Thank you for your answers.
> >> @Raj, one-word-translations do not exist, I have searched for them. If the
> >> grow-diag method probably causes such phenomena, are there any better
> >> alternatives?
> >> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but
> >> I do not really understand why. Why is "guitar" in the example below
> >> aligned to "Musikinstrument Gittare", and not to "Gitarre" only? I assume,
> >> decomposing "Musik + Instrument" would help? How else could I improve the
> >> word alignment quality?
> >> Thanks!
> >> Best,
> >> Vera
> >>
> >> für ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1
> >> }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 })
> >> an ({ 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
> >>
> >> -----Ursprüngliche Nachricht-----
> >> Von: Matthias Huck [mailto:[email protected]]
> >> Gesendet: Mittwoch, 26. November 2014 17:54
> >> An: Raj Dabre
> >> Cc: Vera Aleksic, Linguatec GmbH; moses-support
> >> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
> >>
> >> Hi,
> >>
> >> Supposedly your phrase table does not contain an entry "Gitarre |||
> >> guitar" because this word pair is always unaligned in your training data.
> >> You could try to improve your word alignment quality.
> >>
> >> Alternatively, you could implement a procedure in the manner of the
> >> "forced single word heuristic" as described in:
> >> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to
> >> Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin
> >> of Mathematical Linguistics, number 95, pages 5-18, Prague, Czech
> >> Republic, April 2011.
> >> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
> >> (see Fig. 1c).
> >>
> >> But the latter would rather be a workaround.
> >>
> >> Cheers,
> >> Matthias
> >>
> >>
> >> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> >>> Hello,
> >>>
> >>>
> >>> If I am not wrong this is most likely due to the grow (-diag) method
> >>> applied to the word aligned data (both directions) before phrase
> >>> extraction.
> >>>
> >>> Furthermore..... one word translations should exist (but not always)....
> >>> search for them.
> >>>
> >>>
> >>>
> >>> Regards.
> >>>
> >>>
> >>> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH
> >>> <[email protected]> wrote:
> >>> Hi,
> >>>
> >>> I have observed many times that some words do not exist as
> >>> single word translations in the phrase table, although they exist in the
> >>> training corpus and in multiword phrases.
> >>> An example:
> >>> German-English translation for "Gitarre" is unknown, i.e. there
> >>> is no single word entry for "Gitarre" in the phrase table, although some
> >>> other phrases containing this word exist (see below).
> >>> How is it possible?
> >>> Thanks and best regards,
> >>> Vera
> >>>
> >>>
> >>> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| |||
> >>> 1 1
> >>> Gitarre darstellt , unter Beanspruchung ||| guitar using |||
> >>> 0.25 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> >>> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05
> >>> 1 0.0625119 2.718 ||| ||| 4 1
> >>> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1
> >>> 0.0625119 2.718 ||| ||| 4 1
> >>> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1
> >>> 0.0625119 2.718 ||| ||| 4 1
> >>> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using
> >>> ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> >>> Kopfplatte einer Gitarre darstellt ||| head of a guitar using
> >>> ||| 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> >>> eine elektrische Gitarre , ||| an electric guitar ; ||| 1
> >>> 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
> >>> einer Gitarre darstellt , unter ||| of a guitar using |||
> >>> 0.333333 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> >>> einer Gitarre darstellt , ||| of a guitar using ||| 0.333333
> >>> 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> >>> einer Gitarre darstellt ||| of a guitar using ||| 0.333333
> >>> 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
> >>> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1
> >>> 0.0142097 2.718 ||| ||| 1 1
> >>> wie eine elektrische Gitarre , ||| as an electric guitar ; |||
> >>> 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> [email protected]
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>
> >>>
> >>>
> >>> --
> >>> Raj Dabre.
> >>> Research Student,
> >>>
> >>> Graduate School of Informatics,
> >>> Kyoto University.
> >>> CSE MTech, IITB., 2011-2014
> >>>
> >>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> [email protected]
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >> --
> >> The University of Edinburgh is a charitable body, registered in Scotland,
> >> with registration number SC005336.
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support