Java meets this requirement, but only just barely. RL1.6 Line Boundaries
To meet this requirement, if an implementation provides for line-boundary testing, it shall recognize not only CRLF, LF, CR, but also NEL (U+0085), PS (U+2029) and LS (U+2028). The reason I say "barely" is because immediately below that, tr18 reads: Formfeed (U+000C) also normally indicates an end-of-line. For more information, see Chapter 3 of [Unicode]. [...] A newline sequence is defined to be any of the following: \u000A | \u000B | \u000C | \u000D | \u0085 | \u2028 | \u2029 | \u000D\u000A The code in j.u.r.Pattern does *not* take "\f" U+0C FORM FEED or "\v" U+0B VERTICAL TAB into account. Both those are included in the newline sequence definition given immediately above. Below that is this strong recommendation, which Java also neglects: It is strongly recommended that there be a regular expression meta-character, such as "\R", for matching all line ending characters and sequences listed above (e.g. in #1). It would thus be shorthand for: ( \u000D\u000A | [\u000A\u000B\u000C\u000D\u0085\u2028\u2029] ) Perl has supported \R for some years now, and I have implemented the strongly recommended \R metacharacter in my Java regex rewriting library using that definition. This is much cleaner than having to deal with the entire UNIX_LINES thing, and probably why they strongly recommended it. --tom