i prefer FilePiece outputs a failthful representation of the file. If you need to clean your data, I think it should go into the cleaning or normalization scripts
Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu On 18 May 2015 at 09:05, Jeroen Vermeulen <[email protected] > wrote: > Hi all, > > I'm trying to fix a problem on Windows where lexical-reordering-score > breaks because of Windows-style line endings — "\r\n" instead of "\n". > These inevitably get in here and there when users produce files on Windows. > > A simple solution I've been testing successfully is this: in > FilePiece::ReadLine() (and its sibling ReadLineEOF()), if the last > character before the \n is a \r (carriage return), then don't include > that character in the line that is returned. And of course there's a > parameter to disable this behaviour if desired. > > This looks relatively safe to me, to the extent that calling ReadLine() > implies that what you're reading is a text file. It's not something > you'd want to do with a binary file. > > However in principle there could be situations where you have a carriage > return at the end of a line in your file (on a non-Windows system), and > you want to keep it. > > Can anyone think of such a situation? Any objections against merging my > patch? > > > Jeroen > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
