I think he's reasonably asking that -drop-unknown should drop unknown
words even if they contain digits. Maybe this means another
command-line option.
Also, anybody else notice that this code has no effect?
if (isDigit == 1)
isDigit = 1;
else
isDigit == 0;
On 07/25/13 08:52, Hieu Hoang wrote:
> ah, this would be a problem for you.
>
> I don't know Latin Mongolian so I don't know how to solve it. If you
> have any suggestions or code, please let me know.
>
> If you can share the data, that would be great. This would let other
> people find out about this language pair.
>
> On 25 July 2013 01:40, Xiang Li <[email protected]
> <mailto:[email protected]>> wrote:
>
> I find the following code in the moses/TranslationOptionCollection.cpp
>
> isDigit = s.find_first_of(“0123456789”);
> if (isDigit == 1)
> isDigit = 1;
> else
> isDigit == 0;
>
> But nearly the same code segment appears in the moses/ChartParser.cpp
>
> isDigit = s.find_first_of(“0123456789”);
> if (isDigit == string::npos)
> isDigit = 0;
> else
> isDigit == 1;
>
>
> I guess that it is to treat a token which contains a digit as a
> normal work not an unknown word. However, the digit ‘0’ is a
> character in the Latin Mongolian. So those work which contain digit
> will be treated as a known word. It is strange that at the MEMT,
> unknown words does not appear in the nbest file, but appear in the
> final dev result file.
>
>
> 在 2013年7月25日,2:44,Hieu Hoang <[email protected]
> <mailto:[email protected]>> 写道:
>
>> I think you asked this question before. I check and was pretty
>> sure it works.
>>
>> How exactly are you running Moses? Can you send me your config
>> files and any other info that you think might be useful to debug
>> this issue.
>>
>> On 23 July 2013 07:46, Li Xiang <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> At MERT stage, I open the switch "-drop-unknown" for decoder
>> moses_chart. But some oov works sill appear in the output
>> translation. I carefully check the source traing data, but I
>> does not find the oov words.
>>
>> The source language is latin mongolian. Its character consists
>> of "0 % _ -" additionally.
>>
>> Whether the switch option does not play a rule for MERT?
>>
>> --
>> Xiang Li
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected] <mailto:[email protected]>
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support