i prefer FilePiece outputs a failthful representation of the file. If you
need to clean your data, I think it should go into the cleaning or
normalization scripts

Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 18 May 2015 at 09:05, Jeroen Vermeulen <[email protected]
> wrote:

> Hi all,
>
> I'm trying to fix a problem on Windows where lexical-reordering-score
> breaks because of Windows-style line endings — "\r\n" instead of "\n".
> These inevitably get in here and there when users produce files on Windows.
>
> A simple solution I've been testing successfully is this: in
> FilePiece::ReadLine() (and its sibling ReadLineEOF()), if the last
> character before the \n is a \r (carriage return), then don't include
> that character in the line that is returned.  And of course there's a
> parameter to disable this behaviour if desired.
>
> This looks relatively safe to me, to the extent that calling ReadLine()
> implies that what you're reading is a text file.  It's not something
> you'd want to do with a binary file.
>
> However in principle there could be situations where you have a carriage
> return at the end of a line in your file (on a non-Windows system), and
> you want to keep it.
>
> Can anyone think of such a situation?  Any objections against merging my
> patch?
>
>
> Jeroen
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to