On Mar 13, 2018, at 6:47 AM, Jim Laskey <[email protected]> wrote:
>
> …
> A. Line support.
>
> public Stream<String> lines()
>
Suggest factoring this as:
public Stream<String> splits(String regex) { }
public Stream<String> lines() { return splits(`\n|\r\n?`); }
The reason is that "splits" is useful with several other patterns.
For raw strings, splits(`\n`) is a more efficient way to get the same
result (because they normalize CR NL? to NL). There's also a
nifty unicode-oriented pattern splits(`\R`) which matches a larger
set of line terminations. And of course splits(":") or splits(`\s`) will
be old friends. A new friend might be paragraph splitting splits(`\n\n`).
Splitting is old, as Remi points out, but new thing is supplying the
stream-style fluent notation starting from a (potentially) large string
constant.
> B. Additions to basic trim methods. In addition to margin methods trimIndent
> and trimMarkers described below in Section C, it would be worth introducing
> trimLeft and trimRight to augment the longstanding trim method. A key
> question is how trimLeft and trimRight should detect whitespace, because
> different definitions of whitespace exist in the library.
> ...
> That sets up several kinds of whitespace; trim's whitespace (TWS), Character
> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a
> slow test. UWS is fast for Latin1 and slow-ish for UTF-16.
For the record, even though we are not talking performance much,
CWS is not significantly slower than UWS. You can use a 64-bit int
constant for a bitmask and check for an arbitrary subset of the first
64 ASCII code points in one or two machine instructions.
> We are recommending that trimLeft and trimRight use UWS, leave trim alone to
> avoid breaking the world and then possibly introduce trimWhitespace that uses
> UWS.
Putting aside the performance question, I have to ask if compatibility
with TWS is at all important. (Don't know the answer, suspect not.)
> …
> C. Margin management. With introduction of multi-line Raw String Literals,
> developers will have to deal with the extraneous spacing introduced by
> indenting and formatting string bodies.
>
> Note that for all the methods in this group, if the first line is empty then
> it is removed and if the last is empty then it is removed. This removal
> provides a means for developers that use delimiters on separate lines to
> bracket string bodies. Also note, that all line separators are replaced with
> \n.
(As a bonus, margin management gives a story for escaping leading and trailing
backticks. If your string is a single line, surround it with pipe characters
`|asdf|`.
If your string is multiple lines, surround it with blank lines easy to do.
Either
pipes or newlines will protect backticks from merging into quotes.)
There's a sort of beauty contest going on here between indents and
markers. I often prefer markers, but I see how indents will often win
the contest. I'll pre-emptively disagree with anyone who observes
that we only need one of the two.
> public String trimMarkers(String leftMarker, String rightMarker)
I like this function and anticipate using it. (I use similar things in
shell script here-files.) Thanks for including end-of-line markers
in the mix. This allows lines with significant *trailing* whitespace
to protect that whitespace as well as *leading* whitespace.
Suggestion: Give users a gentle nudge toward the pipe character by
making it a default argument so trimMarkers() => trimMarkers("|","|").
Suggestion: Allow the markers to be regular expressions.
(So `\|` would be the default.)
>
> D. Escape management. Since Raw String Literals do not interpret Unicode
> escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a
> scheme for developers who just want multi-line strings but still have escape
> sequences interpreted.
This all looks good.
Thanks,
— John