Hi,

I do not think that the detokenizer would cause conversion of ' to ".
You can check the raw output of the decoder, and see how it is
changed by the detokenizer.

-phi

On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <vngu...@neuf.fr> wrote:

> Hi,
>
> I got the following situation:
>
> This group age
> is translated sometimes in:
> ce groupe d'âge (correct)
> ce groupe d" âge (incorrect)
> ce groupe d "âge (incorrect)
>
> I am wondering if this is more a detokenizer issue or a corpus issue, or
> both.
>
> Technically in French, there shouldn't be any space before or after the
> apostrophe.
> In the Europarl Corpus, as well as in the News2014 one, there are some
> instances with a space before or after.
>
> Then I have the feeling that the decoder gets a &apos; with surrounding
> spaces leading to the detokenizer to transform into "
>
> Anyone with a similar issue ?
>
> thanks.
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to