well not exactly my question. I know Moses translate one "line" at a time, meaning a string ending with a line feed.
My question is more, if the string contains a PERIOD (tokenized as such), separating the line in 2 "sentences" then how does it behave ? given my observation I have the feeling that we really need to "sentence-tokenize" first before word-tokenizing. Le 04/12/2015 13:52, John D Burger a écrit : > I think you're asking if Moses translates one sentence at a time. The answer > is yes. > > - John Burger > MITRE > >> On Dec 4, 2015, at 04:43, Vincent Nguyen <[email protected]> wrote: >> >> Actually I don't know if this is a decoder question or such. >> >> Here is my issue >> >> Let's say I have a text string with 2 sentences, with a period ending >> the first sentence, but no CR+LF, just a space before the second sentence. >> >> When I pass the full string to the pipe : >> tokenizer + truecaser + moses + detruecase + detokenizer >> the output is only one sentence, the period at the end of the first >> sentence has been eliminated, the sentence is nonsense (well not good at >> all) >> >> If I insert a CRLF just after the period of the first sentence and send >> the whole thing to the pipe, the output is correct. >> >> Am I missing something ? >> >> Should we only send string to moses segment by segment ? >> >> thanks, >> Vincent >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
