On Mar 13, 2018, at 6:47 AM, Jim Laskey <james.las...@oracle.com> wrote: > > … > A. Line support. > > public Stream<String> lines() >
Suggest factoring this as: public Stream<String> splits(String regex) { } public Stream<String> lines() { return splits(`\n|\r\n?`); } The reason is that "splits" is useful with several other patterns. For raw strings, splits(`\n`) is a more efficient way to get the same result (because they normalize CR NL? to NL). There's also a nifty unicode-oriented pattern splits(`\R`) which matches a larger set of line terminations. And of course splits(":") or splits(`\s`) will be old friends. A new friend might be paragraph splitting splits(`\n\n`). Splitting is old, as Remi points out, but new thing is supplying the stream-style fluent notation starting from a (potentially) large string constant. > B. Additions to basic trim methods. In addition to margin methods trimIndent > and trimMarkers described below in Section C, it would be worth introducing > trimLeft and trimRight to augment the longstanding trim method. A key > question is how trimLeft and trimRight should detect whitespace, because > different definitions of whitespace exist in the library. > ... > That sets up several kinds of whitespace; trim's whitespace (TWS), Character > whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a > slow test. UWS is fast for Latin1 and slow-ish for UTF-16. For the record, even though we are not talking performance much, CWS is not significantly slower than UWS. You can use a 64-bit int constant for a bitmask and check for an arbitrary subset of the first 64 ASCII code points in one or two machine instructions. > We are recommending that trimLeft and trimRight use UWS, leave trim alone to > avoid breaking the world and then possibly introduce trimWhitespace that uses > UWS. Putting aside the performance question, I have to ask if compatibility with TWS is at all important. (Don't know the answer, suspect not.) > … > C. Margin management. With introduction of multi-line Raw String Literals, > developers will have to deal with the extraneous spacing introduced by > indenting and formatting string bodies. > > Note that for all the methods in this group, if the first line is empty then > it is removed and if the last is empty then it is removed. This removal > provides a means for developers that use delimiters on separate lines to > bracket string bodies. Also note, that all line separators are replaced with > \n. (As a bonus, margin management gives a story for escaping leading and trailing backticks. If your string is a single line, surround it with pipe characters `|asdf|`. If your string is multiple lines, surround it with blank lines easy to do. Either pipes or newlines will protect backticks from merging into quotes.) There's a sort of beauty contest going on here between indents and markers. I often prefer markers, but I see how indents will often win the contest. I'll pre-emptively disagree with anyone who observes that we only need one of the two. > public String trimMarkers(String leftMarker, String rightMarker) I like this function and anticipate using it. (I use similar things in shell script here-files.) Thanks for including end-of-line markers in the mix. This allows lines with significant *trailing* whitespace to protect that whitespace as well as *leading* whitespace. Suggestion: Give users a gentle nudge toward the pipe character by making it a default argument so trimMarkers() => trimMarkers("|","|"). Suggestion: Allow the markers to be regular expressions. (So `\|` would be the default.) > > D. Escape management. Since Raw String Literals do not interpret Unicode > escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a > scheme for developers who just want multi-line strings but still have escape > sequences interpreted. This all looks good. Thanks, — John