Hi

I think what the OP wants is to be able to redefine the exceptions to 
the 'drop unknown' strategy. At the moment they are hardcoded to be 
0123456789. This seems quite reasonable, but what would be even better 
is a way to plug in your own OOV handler, in case you want to add in 
some custom rules.

This has been sitting at the top of the project list in this page 
http://www.statmt.org/moses/?n=Moses.GetInvolved, but no takers yet...

cheers - Barry

ps - yes, bizarre mixture of = and == in both of those code snippets. 
And why the different logic between pb and chart?

On 25/07/13 09:12, Kenneth Heafield wrote:
> I think he's reasonably asking that -drop-unknown should drop unknown
> words even if they contain digits.  Maybe this means another
> command-line option.
>
> Also, anybody else notice that this code has no effect?
>
> if (isDigit == 1)
>      isDigit = 1;
> else
>      isDigit == 0;
>
> On 07/25/13 08:52, Hieu Hoang wrote:
>> ah, this would be a problem for you.
>>
>> I don't know Latin Mongolian so I don't know how to solve it. If you
>> have any suggestions or code, please let me know.
>>
>> If you can share the data, that would be great. This would let other
>> people find out about this language pair.
>>
>> On 25 July 2013 01:40, Xiang Li <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>      I find the following code in the moses/TranslationOptionCollection.cpp
>>
>>      isDigit = s.find_first_of(“0123456789”);
>>      if (isDigit == 1)
>>          isDigit = 1;
>>      else
>>          isDigit == 0;
>>
>>      But nearly the same code segment appears in the moses/ChartParser.cpp
>>
>>      isDigit = s.find_first_of(“0123456789”);
>>      if (isDigit == string::npos)
>>          isDigit = 0;
>>      else
>>          isDigit == 1;
>>
>>
>>      I guess that it is to treat a token which contains a digit as a
>>      normal work not an unknown word. However, the digit ‘0’ is a
>>      character in the Latin Mongolian. So those work which contain digit
>>       will be treated as a known word. It is strange that at the MEMT,
>>      unknown words does not appear in the nbest file, but appear in the
>>      final dev result file.
>>
>>
>>      在 2013年7月25日,2:44,Hieu Hoang <[email protected]
>>      <mailto:[email protected]>> 写道:
>>
>>>      I think you asked this question before. I check and was pretty
>>>      sure it works.
>>>
>>>      How exactly are you running Moses? Can you send me your config
>>>      files and any other info that you think might be useful to debug
>>>      this issue.
>>>
>>>      On 23 July 2013 07:46, Li Xiang <[email protected]
>>>      <mailto:[email protected]>> wrote:
>>>
>>>          At MERT stage, I open the switch "-drop-unknown" for decoder
>>>          moses_chart. But some oov works sill appear in the output
>>>          translation. I carefully check the source traing data, but I
>>>          does not find the oov words.
>>>
>>>          The source language is latin mongolian. Its character consists
>>>          of "0 % _ -" additionally.
>>>
>>>          Whether the switch option does not play a rule for MERT?
>>>
>>>          --
>>>          Xiang Li
>>>
>>>          _______________________________________________
>>>          Moses-support mailing list
>>>          [email protected] <mailto:[email protected]>
>>>          http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>>
>>>      --
>>>      Hieu Hoang
>>>      Research Associate
>>>      University of Edinburgh
>>>      http://www.hoang.co.uk/hieu
>>
>>
>>
>> -- 
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to