well not exactly my question. I know Moses translate one "line" at a 
time, meaning a string ending with a line feed.

My question is more, if the string contains a PERIOD (tokenized as 
such), separating the line in 2 "sentences" then how does it behave ?

given my observation I have the feeling that we really need to 
"sentence-tokenize" first before word-tokenizing.



Le 04/12/2015 13:52, John D Burger a écrit :
> I think you're asking if Moses translates one sentence at a time. The answer 
> is yes.
>
> - John Burger
>    MITRE
>
>> On Dec 4, 2015, at 04:43, Vincent Nguyen <[email protected]> wrote:
>>
>> Actually I don't know if this is a decoder question or such.
>>
>> Here is my issue
>>
>> Let's say I have a text string with 2 sentences, with a period ending
>> the first sentence, but no CR+LF, just a space before the second sentence.
>>
>> When I pass the full string to the pipe :
>> tokenizer + truecaser + moses + detruecase + detokenizer
>> the output is only one sentence, the period at the end of the first
>> sentence has been eliminated, the sentence is nonsense (well not good at
>> all)
>>
>> If I insert a CRLF just after the period of the first sentence and send
>> the whole thing to the pipe, the output is correct.
>>
>> Am I missing something ?
>>
>> Should we only send string to moses segment by segment ?
>>
>> thanks,
>> Vincent
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to