Hi,
I have one more question:
In the lex.e2f file there is a translation Gitarre->guitar:
Gitarre guitar 0.4000000
Gitarre using 0.0000284
Gitarre ; 0.0000017
Why has not it became part of the phrase table?
Thanks again!
Vera
-----Ursprüngliche Nachricht-----
Von: Vera Aleksic, Linguatec GmbH
Gesendet: Donnerstag, 27. November 2014 09:42
An: 'Matthias Huck'; Raj Dabre
Betreff: AW: [Moses-support] Unknown single words that are part of phrases
Hi,
Thank you for your answers.
@Raj, one-word-translations do not exist, I have searched for them. If the
grow-diag method probably causes such phenomena, are there any better
alternatives?
@Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I do
not really understand why. Why is "guitar" in the example below aligned to
"Musikinstrument Gittare", and not to "Gitarre" only? I assume, decomposing
"Musik + Instrument" would help? How else could I improve the word alignment
quality?
Thanks!
Best,
Vera
für ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 }) a
({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an ({ 5 })
electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
-----Ursprüngliche Nachricht-----
Von: Matthias Huck [mailto:[email protected]]
Gesendet: Mittwoch, 26. November 2014 17:54
An: Raj Dabre
Cc: Vera Aleksic, Linguatec GmbH; moses-support
Betreff: Re: [Moses-support] Unknown single words that are part of phrases
Hi,
Supposedly your phrase table does not contain an entry "Gitarre ||| guitar"
because this word pair is always unaligned in your training data. You could try
to improve your word alignment quality.
Alternatively, you could implement a procedure in the manner of the "forced
single word heuristic" as described in:
D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to Jane,
an Open Source Hierarchical Translation Toolkit. The Prague Bulletin of
Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, April
2011.
http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
(see Fig. 1c).
But the latter would rather be a workaround.
Cheers,
Matthias
On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
> Hello,
>
>
> If I am not wrong this is most likely due to the grow (-diag) method applied
> to the word aligned data (both directions) before phrase extraction.
>
> Furthermore..... one word translations should exist (but not always)....
> search for them.
>
>
>
> Regards.
>
>
> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH
> <[email protected]> wrote:
> Hi,
>
> I have observed many times that some words do not exist as single
> word translations in the phrase table, although they exist in the training
> corpus and in multiword phrases.
> An example:
> German-English translation for "Gitarre" is unknown, i.e. there is no
> single word entry for "Gitarre" in the phrase table, although some other
> phrases containing this word exist (see below).
> How is it possible?
> Thanks and best regards,
> Vera
>
>
> Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 1
> Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25
> 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1
> 0.0625119 2.718 ||| ||| 4 1
> Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 0.0625119
> 2.718 ||| ||| 4 1
> Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119
> 2.718 ||| ||| 4 1
> Kopfplatte einer Gitarre darstellt , ||| head of a guitar using |||
> 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
> Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 0.5
> 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
> eine elektrische Gitarre , ||| an electric guitar ; ||| 1 0.00107982
> 1 0.00163632 2.718 ||| ||| 1 1
> einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333
> 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt , ||| of a guitar using ||| 0.333333
> 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
> einer Gitarre darstellt ||| of a guitar using ||| 0.333333 0.00217827
> 1 0.00471281 2.718 ||| ||| 3 1
> elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1
> 0.0142097 2.718 ||| ||| 1 1
> wie eine elektrische Gitarre , ||| as an electric guitar ; |||
> 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> --
> Raj Dabre.
> Research Student,
>
> Graduate School of Informatics,
> Kyoto University.
> CSE MTech, IITB., 2011-2014
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
--
The University of Edinburgh is a charitable body, registered in Scotland, with
registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support