Hi Vincent, as far as Moses is concerned, the end of a sentence is marked by whatever the end-of-line marker is on the respective OS (Win: CRLF, Linux: LF, Mac: CR, apparently). A period is treated as a plain old token. The purpose of the sentence splitter that Kenneth mentioned is to tell Moses what the "sentence" boundaries are.
The language model has a concept of sentences beginning and ending and usually doesn't like periods anywhere except at the end of a sentence, so it'll down-vote translation hypotheses containing isolated periods. - Uli On Fri, Dec 4, 2015 at 1:18 PM, Vincent Nguyen <[email protected]> wrote: > > well not exactly my question. I know Moses translate one "line" at a > time, meaning a string ending with a line feed. > > My question is more, if the string contains a PERIOD (tokenized as > such), separating the line in 2 "sentences" then how does it behave ? > > given my observation I have the feeling that we really need to > "sentence-tokenize" first before word-tokenizing. > > > > Le 04/12/2015 13:52, John D Burger a écrit : > > I think you're asking if Moses translates one sentence at a time. The > answer is yes. > > > > - John Burger > > MITRE > > > >> On Dec 4, 2015, at 04:43, Vincent Nguyen <[email protected]> wrote: > >> > >> Actually I don't know if this is a decoder question or such. > >> > >> Here is my issue > >> > >> Let's say I have a text string with 2 sentences, with a period ending > >> the first sentence, but no CR+LF, just a space before the second > sentence. > >> > >> When I pass the full string to the pipe : > >> tokenizer + truecaser + moses + detruecase + detokenizer > >> the output is only one sentence, the period at the end of the first > >> sentence has been eliminated, the sentence is nonsense (well not good at > >> all) > >> > >> If I insert a CRLF just after the period of the first sentence and send > >> the whole thing to the pipe, the output is correct. > >> > >> Am I missing something ? > >> > >> Should we only send string to moses segment by segment ? > >> > >> thanks, > >> Vincent > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Ulrich Germann Senior Researcher School of Informatics University of Edinburgh
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
