Re: Raw String Literal Library Support

John Rose Tue, 13 Mar 2018 15:50:45 -0700

On Mar 13, 2018, at 6:47 AM, Jim Laskey <james.las...@oracle.com> wrote:
> 
> …
> A. Line support.
> 
> public Stream<String> lines()
>


Suggest factoring this as:

 public Stream<String> splits(String regex) { }
 public Stream<String> lines() { return splits(`\n|\r\n?`); }

The reason is that "splits" is useful with several other patterns.
For raw strings, splits(`\n`) is a more efficient way to get the same
result (because they normalize CR NL? to NL).  There's also a
nifty unicode-oriented pattern splits(`\R`) which matches a larger
set of line terminations.  And of course splits(":") or splits(`\s`) will
be old friends.  A new friend might be paragraph splitting splits(`\n\n`).

Splitting is old, as Remi points out, but new thing is supplying the
stream-style fluent notation starting from a (potentially) large string
constant.

> B. Additions to basic trim methods. In addition to margin methods trimIndent 
> and trimMarkers described below in Section C, it would be worth introducing 
> trimLeft and trimRight to augment the longstanding trim method. A key 
> question is how trimLeft and trimRight should detect whitespace, because 
> different definitions of whitespace exist in the library. 
> ...
> That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a 
> slow test. UWS is fast for Latin1 and slow-ish for UTF-16. 

For the record, even though we are not talking performance much,
CWS is not significantly slower than UWS.  You can use a 64-bit int
constant for a bitmask and check for an arbitrary subset of the first
64 ASCII code points in one or two machine instructions.

> We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
> avoid breaking the world and then possibly introduce trimWhitespace that uses 
> UWS.

Putting aside the performance question, I have to ask if compatibility
with TWS is at all important.  (Don't know the answer, suspect not.)
> …
> C. Margin management. With introduction of multi-line Raw String Literals, 
> developers will have to deal with the extraneous spacing introduced by 
> indenting and formatting string bodies. 
> 
> Note that for all the methods in this group, if the first line is empty then 
> it is removed and if the last is empty then it is removed. This removal 
> provides a means for developers that use delimiters on separate lines to 
> bracket string bodies. Also note, that all line separators are replaced with 
> \n.

(As a bonus, margin management gives a story for escaping leading and trailing
backticks.  If your string is a single line, surround it with pipe characters 
`|asdf|`.
If your string is multiple lines, surround it with blank lines easy to do.  
Either
pipes or newlines will protect backticks from merging into quotes.)

There's a sort of beauty contest going on here between indents and
markers.  I often prefer markers, but I see how indents will often win
the contest.  I'll pre-emptively disagree with anyone who observes
that we only need one of the two.

> public String trimMarkers(String leftMarker, String rightMarker)

I like this function and anticipate using it.  (I use similar things in
shell script here-files.)  Thanks for including end-of-line markers
in the mix.  This allows lines with significant *trailing* whitespace
to protect that whitespace as well as *leading* whitespace.

Suggestion:  Give users a gentle nudge toward the pipe character by
making it a default argument so trimMarkers() => trimMarkers("|","|").

Suggestion:  Allow the markers to be regular expressions.
(So `\|` would be the default.)

> 
> D. Escape management. Since Raw String Literals do not interpret Unicode 
> escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a 
> scheme for developers who just want multi-line strings but still have escape 
> sequences interpreted.

This all looks good.

Thanks,

— John

Re: Raw String Literal Library Support

Reply via email to