Hi Vera

I think the situation you describe could happen even without unaligned 
words. Suppose that you have a 2 word sentence on each side, and the 
alignment points are (0,0), (0,1) and (1,0) - I think this is possible 
with the usual symmetrisation algorithm. Then you would extract the 
phrase pair containing 2 2-word phrases, but no phrase pairs containing 
1-word phrases. (see below for an example)

You still get lexical weights for the translation of word-0 to word-0 
though, since there is an alignment point there

cheers - Barry


[hyperion]bhaddow: cat c.en
a b
[hyperion]bhaddow: cat c.fr
A B
[hyperion]bhaddow: cat c.align
0-0 1-0 0-1
[hyperion]bhaddow: ~/moses.new/bin/extract c.en c.fr c.align e 5
PhraseExtract v1.4, written by Philipp Koehn
phrase extraction from an aligned parallel corpus
[hyperion]bhaddow: cat e
A B ||| a b ||| 0-0 1-0 0-1
[hyperion]bhaddow: ~/moses.new/scripts/training/get-lexical.perl c.en 
c.fr c.align c
(c.en,c.fr,c)
FILE: c.fr
FILE: c.en
FILE: c.align
!
Saved: c.f2e and c.e2f
[hyperion]bhaddow: cat c.e2f
a A 0.5000000
a B 1.0000000
b A 0.5000000
[hyperion]bhaddow: cat c.f2e
A a 0.5000000
B a 0.5000000
A b 1.0000000


On 27/11/14 16:15, Matthias Huck wrote:
> Hi Vera,
>
> It's odd that the lexical translation model contains such an entry if
> the pair is always unaligned. Maybe you used a different word alignment
> when you extracted the lexicon model?
>
> You should manually have a look at your word alignment in order to check
> whether it has reasonable quality. There's a visualization tool called
> "Picaro" in Moses:
>
> $ moses/contrib/picaro/picaro.py -a1 model/aligned.1.grow-diag-final-and -f 
> model/aligned.1.0.de -e model/aligned.1.0.en
>
> In order to find out whether the symmetrization heuristic is an issue
> for you, you can compare the standard and inverse GIZA alignments with
> the symmetrized alignment.
>
> Ways to experiment with word alignment quality are for instance:
>
> - Choosing a different symmetrization heuristic
> - Modifying the GIZA settings, e.g. by training with a different number
> of EM iterations or a different sequence of IBM/HMM models
> - Using some other method for training word alignments, e.g. a
> discriminative word aligner
>
> Also, if the amount of parallel training data is small, you shouldn't be
> surprised if you are not able to train reliable models.
>
> Cheers,
> Matthias
>
>
> On Thu, 2014-11-27 at 14:45 +0100, Vera Aleksic, Linguatec GmbH wrote:
>> Hi,
>>
>> I have one more question:
>> In the lex.e2f file there is a translation Gitarre->guitar:
>>
>>      Gitarre guitar 0.4000000
>>      Gitarre using 0.0000284
>>      Gitarre ; 0.0000017
>>
>> Why has not it became part of the phrase table?
>>
>> Thanks again!
>> Vera
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Vera Aleksic, Linguatec GmbH
>> Gesendet: Donnerstag, 27. November 2014 09:42
>> An: 'Matthias Huck'; Raj Dabre
>> Betreff: AW: [Moses-support] Unknown single words that are part of phrases
>>
>> Hi,
>> Thank you for your answers.
>> @Raj, one-word-translations do not exist, I have searched for them. If the 
>> grow-diag method probably causes such phenomena, are there any better 
>> alternatives?
>> @Matthias, you are right, the pair Gitarre-guitar is always unaligned, but I 
>> do not really understand why. Why is "guitar" in the example below aligned 
>> to "Musikinstrument Gittare", and not to "Gitarre" only? I assume, 
>> decomposing "Musik + Instrument" would help? How else could I improve the 
>> word alignment quality?
>> Thanks!
>> Best,
>> Vera
>>
>> für ein Musikinstrument wie eine elektrische Gitarre , NULL ({ }) for ({ 1 
>> }) a ({ 2 }) musical ({ }) instrument ({ }) , ({ }) such ({ }) as ({ 4 }) an 
>> ({ 5 }) electric ({ 6 }) guitar ({ 3 7 }) ; ({ 8 })
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Matthias Huck [mailto:[email protected]]
>> Gesendet: Mittwoch, 26. November 2014 17:54
>> An: Raj Dabre
>> Cc: Vera Aleksic, Linguatec GmbH; moses-support
>> Betreff: Re: [Moses-support] Unknown single words that are part of phrases
>>
>> Hi,
>>
>> Supposedly your phrase table does not contain an entry "Gitarre ||| guitar" 
>> because this word pair is always unaligned in your training data. You could 
>> try to improve your word alignment quality.
>>
>> Alternatively, you could implement a procedure in the manner of the "forced 
>> single word heuristic" as described in:
>> D. Stein, D. Vilar, S. Peitz, M. Freitag, M. Huck, and H. Ney. A Guide to 
>> Jane, an Open Source Hierarchical Translation Toolkit. The Prague Bulletin 
>> of Mathematical Linguistics, number 95, pages 5-18, Prague, Czech Republic, 
>> April 2011.
>> http://ufal.mff.cuni.cz/pbml/95/art-stein-vilar-ney-jane.pdf
>> (see Fig. 1c).
>>
>> But the latter would rather be a workaround.
>>
>> Cheers,
>> Matthias
>>
>>
>> On Thu, 2014-11-27 at 01:18 +0900, Raj Dabre wrote:
>>> Hello,
>>>
>>>
>>> If I am not wrong this is most likely due to the grow (-diag) method 
>>> applied to the word aligned data (both directions) before phrase extraction.
>>>
>>> Furthermore..... one word translations should exist (but not always).... 
>>> search for them.
>>>
>>>
>>>
>>> Regards.
>>>
>>>
>>> On Thu, Nov 27, 2014 at 12:53 AM, Vera Aleksic, Linguatec GmbH 
>>> <[email protected]> wrote:
>>>          Hi,
>>>          
>>>          I have observed many times that some words do not exist as single 
>>> word translations in the phrase table, although they exist in the training 
>>> corpus and in multiword phrases.
>>>          An example:
>>>          German-English translation for "Gitarre" is unknown, i.e. there is 
>>> no single word entry  for "Gitarre" in the phrase table, although some 
>>> other phrases containing this word exist (see below).
>>>          How is it possible?
>>>          Thanks and best regards,
>>>          Vera
>>>          
>>>          
>>>          Gitarre , ||| guitar ; ||| 1 0.0284465 1 0.0654272 2.718 ||| ||| 1 
>>> 1
>>>          Gitarre darstellt , unter Beanspruchung ||| guitar using ||| 0.25 
>>> 2.7351e-11 1 0.0625119 2.718 ||| ||| 4 1
>>>          Gitarre darstellt , unter ||| guitar using ||| 0.25 1.18917e-05 1 
>>> 0.0625119 2.718 ||| ||| 4 1
>>>          Gitarre darstellt , ||| guitar using ||| 0.25 0.00569228 1 
>>> 0.0625119 2.718 ||| ||| 4 1
>>>          Gitarre darstellt ||| guitar using ||| 0.25 0.0400028 1 0.0625119 
>>> 2.718 ||| ||| 4 1
>>>          Kopfplatte einer Gitarre darstellt , ||| head of a guitar using 
>>> ||| 0.5 4.23407e-08 1 0.00471281 2.718 ||| ||| 2 1
>>>          Kopfplatte einer Gitarre darstellt ||| head of a guitar using ||| 
>>> 0.5 2.97552e-07 1 0.00471281 2.718 ||| ||| 2 1
>>>          eine elektrische Gitarre , ||| an electric guitar ; ||| 1 
>>> 0.00107982 1 0.00163632 2.718 ||| ||| 1 1
>>>          einer Gitarre darstellt , unter ||| of a guitar using ||| 0.333333 
>>> 6.4754e-07 1 0.00471281 2.718 ||| ||| 3 1
>>>          einer Gitarre darstellt , ||| of a guitar using ||| 0.333333 
>>> 0.000309961 1 0.00471281 2.718 ||| ||| 3 1
>>>          einer Gitarre darstellt ||| of a guitar using ||| 0.333333 
>>> 0.00217827 1 0.00471281 2.718 ||| ||| 3 1
>>>          elektrische Gitarre , ||| electric guitar ; ||| 1 0.005661 1 
>>> 0.0142097 2.718 ||| ||| 1 1
>>>          wie eine elektrische Gitarre , ||| as an electric guitar ; |||
>>> 1 0.000177339 1 0.000809485 2.718 ||| ||| 1 1
>>>          
>>>          _______________________________________________
>>>          Moses-support mailing list
>>>          [email protected]
>>>          http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> --
>>> Raj Dabre.
>>> Research Student,
>>>
>>> Graduate School of Informatics,
>>> Kyoto University.
>>> CSE MTech, IITB., 2011-2014
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in Scotland, 
>> with registration number SC005336.
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to