On Sunday, 29 December 2013 at 18:45:36 UTC, Dmitry Olshansky wrote:
I've come to conclusion that the only sane line ending behavior is to do what Unicode standard says, and detect the following pattern as line separator:

\r\n | \r | \f | \v | \n | \u0085 | \u2028 | \u2029

This includes never breaking a line in between \r\n sequence.

I don't think something as basic as a line-splitting function should do UTF decoding unless the user asks for it explicitly. Getting UTF-8 decoding errors in splitLines when working with ASCII files has caused be enough frustration to stop using that function altogether (unless I *KNOW* the text is valid UTF-8). I've yet to encounter a need to split by anything other than \n and \r\n.

Reply via email to