I think he's reasonably asking that -drop-unknown should drop unknown
words even if they contain digits.  Maybe this means another
command-line option.

Also, anybody else notice that this code has no effect?

if (isDigit == 1)
    isDigit = 1;
else
    isDigit == 0;

On 07/25/13 08:52, Hieu Hoang wrote:
> ah, this would be a problem for you.
> 
> I don't know Latin Mongolian so I don't know how to solve it. If you
> have any suggestions or code, please let me know.
> 
> If you can share the data, that would be great. This would let other
> people find out about this language pair.
> 
> On 25 July 2013 01:40, Xiang Li <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     I find the following code in the moses/TranslationOptionCollection.cpp
> 
>     isDigit = s.find_first_of(“0123456789”);
>     if (isDigit == 1)
>         isDigit = 1;
>     else
>         isDigit == 0;
> 
>     But nearly the same code segment appears in the moses/ChartParser.cpp
> 
>     isDigit = s.find_first_of(“0123456789”);
>     if (isDigit == string::npos)
>         isDigit = 0;
>     else
>         isDigit == 1;
> 
> 
>     I guess that it is to treat a token which contains a digit as a
>     normal work not an unknown word. However, the digit ‘0’ is a
>     character in the Latin Mongolian. So those work which contain digit
>      will be treated as a known word. It is strange that at the MEMT,
>     unknown words does not appear in the nbest file, but appear in the
>     final dev result file.
> 
> 
>     在 2013年7月25日,2:44,Hieu Hoang <[email protected]
>     <mailto:[email protected]>> 写道:
> 
>>     I think you asked this question before. I check and was pretty
>>     sure it works.
>>
>>     How exactly are you running Moses? Can you send me your config
>>     files and any other info that you think might be useful to debug
>>     this issue.
>>
>>     On 23 July 2013 07:46, Li Xiang <[email protected]
>>     <mailto:[email protected]>> wrote:
>>
>>         At MERT stage, I open the switch "-drop-unknown" for decoder
>>         moses_chart. But some oov works sill appear in the output
>>         translation. I carefully check the source traing data, but I
>>         does not find the oov words.
>>
>>         The source language is latin mongolian. Its character consists
>>         of "0 % _ -" additionally.
>>
>>         Whether the switch option does not play a rule for MERT?
>>
>>         -- 
>>         Xiang Li
>>
>>         _______________________________________________
>>         Moses-support mailing list
>>         [email protected] <mailto:[email protected]>
>>         http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>>     -- 
>>     Hieu Hoang
>>     Research Associate
>>     University of Edinburgh
>>     http://www.hoang.co.uk/hieu
> 
> 
> 
> 
> -- 
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to