Hi Vincent,

as far as Moses is concerned, the end of  a sentence is marked by whatever
the end-of-line marker is on the respective OS (Win: CRLF, Linux: LF, Mac:
CR, apparently). A period is treated as a plain old token. The purpose of
the sentence splitter that Kenneth mentioned is to tell Moses what the
"sentence" boundaries are.

The language model has a concept of sentences beginning and ending and
usually doesn't like periods anywhere except at the end of a sentence, so
it'll down-vote translation hypotheses containing isolated periods.

- Uli

On Fri, Dec 4, 2015 at 1:18 PM, Vincent Nguyen <[email protected]> wrote:

>
> well not exactly my question. I know Moses translate one "line" at a
> time, meaning a string ending with a line feed.
>
> My question is more, if the string contains a PERIOD (tokenized as
> such), separating the line in 2 "sentences" then how does it behave ?
>
> given my observation I have the feeling that we really need to
> "sentence-tokenize" first before word-tokenizing.
>
>
>
> Le 04/12/2015 13:52, John D Burger a écrit :
> > I think you're asking if Moses translates one sentence at a time. The
> answer is yes.
> >
> > - John Burger
> >    MITRE
> >
> >> On Dec 4, 2015, at 04:43, Vincent Nguyen <[email protected]> wrote:
> >>
> >> Actually I don't know if this is a decoder question or such.
> >>
> >> Here is my issue
> >>
> >> Let's say I have a text string with 2 sentences, with a period ending
> >> the first sentence, but no CR+LF, just a space before the second
> sentence.
> >>
> >> When I pass the full string to the pipe :
> >> tokenizer + truecaser + moses + detruecase + detokenizer
> >> the output is only one sentence, the period at the end of the first
> >> sentence has been eliminated, the sentence is nonsense (well not good at
> >> all)
> >>
> >> If I insert a CRLF just after the period of the first sentence and send
> >> the whole thing to the pipe, the output is correct.
> >>
> >> Am I missing something ?
> >>
> >> Should we only send string to moses segment by segment ?
> >>
> >> thanks,
> >> Vincent
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Ulrich Germann
Senior Researcher
School of Informatics
University of Edinburgh
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to