Re: [Moses-support] Translating words with apostrophies

Vincent Nguyen Sun, 03 Apr 2016 08:12:40 -0700


Apostrophe is tricky to handle properly
the tokenizer is language sensitive (see -l option)
in French :
l'été => l&apos; été [with a space between ; and é]
in English :
today's story => today &apos;s story

BUT

the issue is sometime in corpora you will find some misplaced spacesbefore or after the apostrophe

therefore you may get &apos; as individual tokens.

the other issue is that in corpora you will find various kind ofapostrophes with various UTF-8 sequences.

You may use the normalize-punctuation.perl script to correct these.




Le 03/04/2016 11:42, Shani Shalgi a écrit :

Hi,
I'm new to Moses, have several questions, but I'll start with askingwhat happens to apostrophes.What I see is that the tokenizer transforms words with apostropheslike this :
it's --> it ' s
c'est une belle journée --> c ' est une belle journée
How does this work in translating the meaning of the word it's(English) / c'est (French)?
I'm using the baseline model which I trained according to the tutorial.
I assume I need to send text tokenized in the same way to betranslated (it's --> it ' s; Otherwise any word that has an apostropheis not translated.)I assumed it ' s (or c ' est) would be considered a phrase, however, Inotice that only the word est is translated (and this is just oneexample, in l'équipe only the word equipe is translated an so forth...)
Am I doing sometihng wrong or misunderstanding something? Or do I needto change the tokenizer to accept ' in the middle of words?
Thanks in advance,
Shani


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Translating words with apostrophies

Reply via email to