I find the following code in the moses/TranslationOptionCollection.cpp
isDigit = s.find_first_of(“0123456789”);
if (isDigit == 1)
isDigit = 1;
else
isDigit == 0;
But nearly the same code segment appears in the moses/ChartParser.cpp
isDigit = s.find_first_of(“0123456789”);
if (isDigit == string::npos)
isDigit = 0;
else
isDigit == 1;
I guess that it is to treat a token which contains a digit as a normal work not
an unknown word. However, the digit ‘0’ is a character in the Latin Mongolian.
So those work which contain digit will be treated as a known word. It is
strange that at the MEMT, unknown words does not appear in the nbest file, but
appear in the final dev result file.
在 2013年7月25日,2:44,Hieu Hoang <[email protected]> 写道:
> I think you asked this question before. I check and was pretty sure it works.
>
> How exactly are you running Moses? Can you send me your config files and any
> other info that you think might be useful to debug this issue.
>
> On 23 July 2013 07:46, Li Xiang <[email protected]> wrote:
> At MERT stage, I open the switch "-drop-unknown" for decoder moses_chart. But
> some oov works sill appear in the output translation. I carefully check the
> source traing data, but I does not find the oov words.
>
> The source language is latin mongolian. Its character consists of "0 % _ -"
> additionally.
>
> Whether the switch option does not play a rule for MERT?
>
> --
> Xiang Li
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support