Indeed, you should split sentences into separate lines.  Here's the script:

https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ems/support/split-sentences.perl

Note that the script assumes you have placed <P> tags in the text to
force sentence boundaries.  It will not assume that existing linebreaks
indicate sentence boundaries.  If you don't put <P> tags in, it will
read the entire corpus into RAM then try to break it, which will
typically run out of memory.

Kenneth

On 12/04/2015 01:18 PM, Vincent Nguyen wrote:
> 
> well not exactly my question. I know Moses translate one "line" at a 
> time, meaning a string ending with a line feed.
> 
> My question is more, if the string contains a PERIOD (tokenized as 
> such), separating the line in 2 "sentences" then how does it behave ?
> 
> given my observation I have the feeling that we really need to 
> "sentence-tokenize" first before word-tokenizing.
> 
> 
> 
> Le 04/12/2015 13:52, John D Burger a écrit :
>> I think you're asking if Moses translates one sentence at a time. The answer 
>> is yes.
>>
>> - John Burger
>>    MITRE
>>
>>> On Dec 4, 2015, at 04:43, Vincent Nguyen <[email protected]> wrote:
>>>
>>> Actually I don't know if this is a decoder question or such.
>>>
>>> Here is my issue
>>>
>>> Let's say I have a text string with 2 sentences, with a period ending
>>> the first sentence, but no CR+LF, just a space before the second sentence.
>>>
>>> When I pass the full string to the pipe :
>>> tokenizer + truecaser + moses + detruecase + detokenizer
>>> the output is only one sentence, the period at the end of the first
>>> sentence has been eliminated, the sentence is nonsense (well not good at
>>> all)
>>>
>>> If I insert a CRLF just after the period of the first sentence and send
>>> the whole thing to the pipe, the output is correct.
>>>
>>> Am I missing something ?
>>>
>>> Should we only send string to moses segment by segment ?
>>>
>>> thanks,
>>> Vincent
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to