There are non-traditional uses like ReadLine('\0') to read
null-delimited tokens.But I'd support Jeroen here: the default ReadLine() with no argument should swallow \r. In any case if you're going to change code there, can you do it upstream in github.com/kpu/kenlm ? I just gave you commit access. Also, how would you feel if I changed it to be FakeIFStream with operator>> extraction, at least for integer/float types? Kenneth On 05/18/2015 03:41 AM, Jeroen Vermeulen wrote: > On 18/05/15 14:02, Hieu Hoang wrote: >> i prefer FilePiece outputs a failthful representation of the file. If >> you need to clean your data, I think it should go into the cleaning or >> normalization scripts > > That could go into a lot more places and end up being more brittle though. > > Would it help if I made the default "do not strip carriage returns", and > made lexical-reordering-score request the conversion explicitly? > > Bear in mind here that every time we fopen() a file without the "b" mode > flag, we're really saying we want the same conversion if the runtime > feels the need — as it would on Windows. When we call ReadLine(), at > least it knows we really want the file interpreted as text. > > > Jeroen > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
