Hi,
if a provided source text has no finale line feed ("\n"), the
(de)tokenizer script would add it.
That's line 166 of detokenizer.perl and 141 of tokenizer.perl:
$text .= "\n" unless $text =~ /\n$/;
So I imagine this was meant for pretty printing on a bash console (you
don't want the prompt to start after the return), but that adds data
to the original data, which makes these scripts no reliable. But
pretty-printing is convenient for locale tests, but not for actual
usage.
That's not much of a serious bug, and it can be easily bypassed, so I
haven't directly propose a patch because I wondered if people out
there really want it like this. Do we? (I personally find these kind
of things annoying, even though only a detail)
Thanks!
Jehan
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support