Indeed, you should split sentences into separate lines. Here's the script:
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ems/support/split-sentences.perl Note that the script assumes you have placed <P> tags in the text to force sentence boundaries. It will not assume that existing linebreaks indicate sentence boundaries. If you don't put <P> tags in, it will read the entire corpus into RAM then try to break it, which will typically run out of memory. Kenneth On 12/04/2015 01:18 PM, Vincent Nguyen wrote: > > well not exactly my question. I know Moses translate one "line" at a > time, meaning a string ending with a line feed. > > My question is more, if the string contains a PERIOD (tokenized as > such), separating the line in 2 "sentences" then how does it behave ? > > given my observation I have the feeling that we really need to > "sentence-tokenize" first before word-tokenizing. > > > > Le 04/12/2015 13:52, John D Burger a écrit : >> I think you're asking if Moses translates one sentence at a time. The answer >> is yes. >> >> - John Burger >> MITRE >> >>> On Dec 4, 2015, at 04:43, Vincent Nguyen <[email protected]> wrote: >>> >>> Actually I don't know if this is a decoder question or such. >>> >>> Here is my issue >>> >>> Let's say I have a text string with 2 sentences, with a period ending >>> the first sentence, but no CR+LF, just a space before the second sentence. >>> >>> When I pass the full string to the pipe : >>> tokenizer + truecaser + moses + detruecase + detokenizer >>> the output is only one sentence, the period at the end of the first >>> sentence has been eliminated, the sentence is nonsense (well not good at >>> all) >>> >>> If I insert a CRLF just after the period of the first sentence and send >>> the whole thing to the pipe, the output is correct. >>> >>> Am I missing something ? >>> >>> Should we only send string to moses segment by segment ? >>> >>> thanks, >>> Vincent >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
