Re: Raw String Literal Library Support

Jim Laskey Wed, 21 Mar 2018 07:53:17 -0700

One more change set.

trimIndent -> stripIndent
trimMarkers -> stripMarkers




> On Mar 20, 2018, at 10:35 AM, Jim Laskey <james.las...@oracle.com> wrote:
> 
> Summary.
> 
> A. Line support.
> 
> - Supporting a mix of line terminators `\n|\r\n|\r` is already a well 
> established pattern in language parsers, in the JDK (ex. see  
> java.nio.file.FileChannelLinesSpliterator) and RegEx (ex. see `\R`). The 
> performance difference between checking one vs the three is negligible.
> 
> - Yes, Stream<String> stream = 
> Pattern.compile("\n|\r\n|\r").splitAsStream(string); is very useful 
> (Spliterators rule), but is cumbersome in this expected to be common use 
> case. Only so-so streamy. :-)
> 
> - BufferedRead.lines() vs. String.lines() is a tricky discussion. It comes 
> down to whether the new line is a terminator or a separator.  In the i/o 
> case, it seems terminator is the right answer. A well formed text file will 
> have a new line at the end of every line.  However, I think you’ll find when 
> people work with multi-line strings they think of new line as a separator. 
> Hence, the common use of split(“\n”) and “”.split(“\n”).length == 1. 
> Indentation, the position of closing delimiter and margin trimming makes that 
> last line very fluid.
> 
> What clinches the deal is that  
> string.lines().collect(joining(“\n”)).equals(string). I’ll ensure both 
> versions of lines() have the difference well javadocumented.
> 
> - The current Spliterator implementation makes 
> String.lines().toArray(String[]::new) an order of magnitude faster than 
> split(`\n|\r\n|\r`). That’s why I implemented it for margin management. 
> Faster still if no collection/array is constructed.
> 
> BTW: split(`\R`) is 2x-3x faster than split(`\n|\r\n|\r`). Nice.
> 
> B. Additions to basic trim methods.
> 
> - Revamped to become strip, stripLeading, stripTrailing using 
> Character.isWhiteSpace(codepoint) as the test (optimized using ch == ‘ ' || 
> ch == ‘\t’ || Character.isWhiteSpace(ch)).
> 
> - No strong feeling about it, but String.trim() could be recommended for 
> deprecation.
> 
> C. Margin management.
> 
> - String.trimMarkers() as a default to String.trimMarkers(“|”, “|”) is 
> reasonable.  Will put it in the CSR for broader discussion.
> 
> - Re use of patterns. I think the Stream<String> lines() method will make it 
> very easy enough to create custom trim margin lambdas.
> 
> D. Escape management.
> 
> - Good
> 
> Cheers,
> 
> — Jim
> 
> 
> 
> 
>> On Mar 13, 2018, at 10:47 AM, Jim Laskey <james.las...@oracle.com> wrote:
>> 
>> With the announcement of JEP 326 Raw String Literals, we would like to open 
>> up a discussion with regards to RSL library support. Below are several 
>> implemented String methods that are believed to be appropriate. Please 
>> comment on those mentioned below including recommending alternate names or 
>> signatures. Additional methods can be considered if warranted, but as 
>> always, the bar for inclusion in String is high.
>> 
>> You should keep a couple things in mind when reviewing these methods.
>> 
>> Methods should be applicable to all strings, not just Raw String Literals.
>> 
>> The number of additional methods should be minimized, not adding every 
>> possible method.
>> 
>> Don't put any emphasis on performance. That is a separate discussion.
>> 
>> Cheers,
>> 
>> -- Jim
>> 
>> A. Line support.
>> 
>> public Stream<String> lines()
>> Returns a stream of substrings extracted from this string partitioned by 
>> line terminators. Internally, the stream is implemented using a 
>> Spliteratorthat extracts one line at a time. The line terminators recognized 
>> are \n, \r\n and \r. This method provides versatility for the developer 
>> working with multi-line strings.
>>    Example:
>> 
>>       String string = "abc\ndef\nghi";
>>       Stream<String> stream = string.lines();
>>       List<String> list = stream.collect(Collectors.toList());
>> 
>>    Result:
>> 
>>    [abc, def, ghi]
>> 
>> 
>>    Example:
>> 
>>       String string = "abc\ndef\nghi";
>>       String[] array = string.lines().toArray(String[]::new);
>> 
>>    Result:
>> 
>>    [Ljava.lang.String;@33e5ccce // [abc, def, ghi]
>> 
>> 
>>    Example:
>> 
>>       String string = "abc\ndef\r\nghi\rjkl";
>>       String platformString =
>>           string.lines().collect(joining(System.lineSeparator()));
>> 
>>    Result:
>> 
>>    abc
>>    def
>>    ghi
>>    jkl
>> 
>> 
>>    Example:
>> 
>>       String string = " abc  \n   def  \n ghi   ";
>>       String trimmedString =
>>            string.lines().map(s -> s.trim()).collect(joining("\n"));
>> 
>>    Result:
>> 
>>    abc
>>    def
>>    ghi
>> 
>> 
>>    Example:
>> 
>>       String table = `First Name      Surname        Phone
>>                       Al              Albert         555-1111
>>                       Bob             Roberts        555-2222
>>                       Cal             Calvin         555-3333
>>                      `;
>> 
>>       // Extract headers
>>       String firstLine = table.lines().findFirst().orElse("");
>>       List<String> headings = List.of(firstLine.trim().split(`\s{2,}`));
>> 
>>       // Build stream of maps
>>       Stream<Map<String, String>> stream =
>>           table.lines().skip(1)
>>                .map(line -> line.trim())
>>                .filter(line -> !line.isEmpty())
>>                .map(line -> line.split(`\s{2,}`))
>>                .map(columns -> {
>>                    List<String> values = List.of(columns);
>>                    return IntStream.range(0, headings.size()).boxed()
>>                                    .collect(toMap(headings::get, 
>> values::get));
>>                });
>> 
>>       // print all "First Name"
>>       stream.map(row -> row.get("First Name"))
>>             .forEach(name -> System.out.println(name));
>> 
>>    Result:
>> 
>>    Al
>>    Bob
>>    Cal
>> B. Additions to basic trim methods. In addition to margin methods trimIndent 
>> and trimMarkers described below in Section C, it would be worth introducing 
>> trimLeft and trimRight to augment the longstanding trim method. A key 
>> question is how trimLeft and trimRight should detect whitespace, because 
>> different definitions of whitespace exist in the library. 
>> 
>> trim itself uses the simple test less than or equal to the space character, 
>> a fast test but not Unicode friendly. 
>> 
>> Character.isWhitespace(codepoint) returns true if codepoint one of the 
>> following;
>> 
>>  SPACE_SEPARATOR.
>>  LINE_SEPARATOR.
>>  PARAGRAPH_SEPARATOR.
>>  '\t',     U+0009 HORIZONTAL TABULATION.
>>  '\n',     U+000A LINE FEED.
>>  '\u000B', U+000B VERTICAL TABULATION.
>>  '\f',     U+000C FORM FEED.
>>  '\r',     U+000D CARRIAGE RETURN.
>>  '\u001C', U+001C FILE SEPARATOR.
>>  '\u001D', U+001D GROUP SEPARATOR.
>>  '\u001E', U+001E RECORD SEPARATOR.
>>  '\u001F', U+001F UNIT SEPARATOR.
>>  ' ',      U+0020 SPACE.
>> (Note: that non-breaking space (\u00A0) is excluded) 
>> 
>> Character.isSpaceChar(codepoint) returns true if codepoint one of the 
>> following;
>> 
>>  SPACE_SEPARATOR.
>>  LINE_SEPARATOR.
>>  PARAGRAPH_SEPARATOR.
>>  ' ',      U+0020 SPACE.
>>  '\u00A0', U+00A0 NON-BREAKING SPACE.
>> That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
>> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is 
>> a slow test. UWS is fast for Latin1 and slow-ish for UTF-16. 
>> 
>> We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
>> avoid breaking the world and then possibly introduce trimWhitespace that 
>> uses UWS.
>> 
>> public String trim() 
>> Removes characters less than equal to space from the beginning and end of 
>> the string. No, change except spec clarification and links to the new trim 
>> methods.
>>   Examples:
>>       "".trim();              // ""
>>       "   ".trim();           // ""
>>       "  abc  ".trim();       // "abc"
>>       "  \u2028abc  ".trim(); // "\u2028abc"
>> public String trimWhitespace() 
>> Removes whitespace from the beginning and end of the string.
>>    Examples:
>> 
>>       "".trimWhitespace();              // ""
>>       "   ".trimWhitespace();           // ""
>>       "  abc  ".trimWhitespace();       // "abc"
>>       "  \u2028abc  ".trimWhitespace(); // "abc"
>> public String trimLeft()
>> Removes whitespace from the beginning of the string.
>>    Examples:
>> 
>>       "".trimLeft();        // ""
>>       "   ".trimLeft();     // ""
>>       "  abc  ".trimLeft(); // "abc  "
>> public String trimRight()
>> Removes whitespace from the end of the string.
>>    Examples:
>> 
>>       "".trimRight();        // ""
>>       "   ".trimRight();     // ""
>>       "  abc  ".trimRight(); // "  abc"
>> C. Margin management. With introduction of multi-line Raw String Literals, 
>> developers will have to deal with the extraneous spacing introduced by 
>> indenting and formatting string bodies. 
>> 
>> Note that for all the methods in this group, if the first line is empty then 
>> it is removed and if the last is empty then it is removed. This removal 
>> provides a means for developers that use delimiters on separate lines to 
>> bracket string bodies. Also note, that all line separators are replaced with 
>> \n.
>> 
>> public String trimIndent()
>> This method determines a representative line in the string body that has a 
>> non-whitespace character closest to the left margin. Once that line has been 
>> determined, the number of leading whitespaces is tallied to produce a 
>> minimal indent amount. Consequently, the result of the method is a string 
>> with the minimal indent amount removed from each line. The first line is 
>> unaffected since it is preceded by the open delimiter. The type of 
>> whitespace used (spaces or tabs) does not affect the result as long as the 
>> developer is consistent with the whitespace used.
>>    Example:
>> 
>>       String x = `
>>                  This is a line
>>                     This is a line
>>                         This is a line
>>                     This is a line
>>                  This is a line
>>                  `.trimIndent();
>> 
>>    Result:
>> 
>>    This is a line
>>        This is a line
>>            This is a line
>>        This is a line
>>    This is a line
>> public String trimMarkers(String leftMarker, String rightMarker)
>> Each line of the multi-line string is first trimmed. If the trimmed line 
>> contains the leftMarker at the beginning of the string then it is removed. 
>> Finally, if the line contains the rightMarker at the end of line, it is 
>> removed.
>>    Example:
>> 
>>        String x = `|This is a line|
>>                    |This is a line|
>>                    |This is a line|`.trimMarkers("|", "|");
>>    Result:
>> 
>>    This is a line
>>    This is a line
>>    This is a line
>> 
>>    Example:
>> 
>>        String x = `>> This is a line
>>>> This is a line
>>>> This is a line`.trimMarkers(">> ", "");
>>    Result:
>> 
>>    This is a line
>>    This is a line
>>    This is a line
>> D. Escape management. Since Raw String Literals do not interpret Unicode 
>> escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a 
>> scheme for developers who just want multi-line strings but still have escape 
>> sequences interpreted.
>> 
>> public String unescape() throws MalformedEscapeException
>> Translates each Unicode escape or escape sequence in the string into the 
>> character represented by the escape. @jls 3.3, 3.10.6
>>    Example:
>> 
>>        `abc\u2022def\nghi`.unescape();
>> 
>>    Result:
>> 
>>    abc•def
>>    ghi
>> public String unescape(EscapeType... escape) throws MalformedEscapeException
>> Selectively translates Unicode escape or escape sequence based on the escape 
>> type flags provided.
>>      public enum EscapeType {
>>           /** Backslash escape sequences based on section 3.10.6 of the
>>            * <cite>The Java&trade; Language Specification</cite>.
>>            * This includes sequences for backspace, horizontal tab,
>>            * line feed, form feed, carriage return, double quote,
>>            * single quote, backslash and octal escape sequences.
>>            */
>>           BACKSLASH, //
>> 
>>           /** Unicode sequences based on section 3.3 of the
>>            * <cite>The Java&trade; Language Specification</cite>.
>>            * This includes sequences in the form {@code \u005Cunnnn}.
>>            */
>>           UNICODE
>>       }
>> 
>> 
>>    Example:
>> 
>>        `abc\u2022def\nghi`.unescape(EscapeType.BACKSLASH);
>> 
>>    Result:
>> 
>>    abc\u2022def
>>    ghi
>> 
>> 
>>    Example:
>> 
>>        `abc\u2022def\nghi`.unescape(EscapeType.UNICODE);
>> 
>>    Result:
>> 
>>    abc•def\nghi
>> Conversely, there are circumstances where the inverse is required
>> 
>> public String escape()
>> Translates each quote, backslash, non-graphic character or non-ASCII 
>> character into an Unicode escape or escape sequence. The method is 
>> equivalent to escape(BACKSLASH, UNICODE) .
>>    Example:
>> 
>>        `abc•def
>>        ghi`.escape();
>> 
>>    Result:
>> 
>>    abc\u2022def\nghi
>> public String escape(EscapeType... escape)
>> Selectively translates each quote, backslash, non-graphic character or 
>> non-ASCII character into an Unicode escape or escape sequence based on the 
>> escape type flags provided.
>>    Example:
>> 
>>        `abc•def
>>        ghi`.escape(EscapeType.BACKSLASH);
>> 
>>    Result:
>> 
>>    abc•def\nghi
>> 
>> 
>>    Example:
>> 
>>        `abc•def
>>        ghi`.escape(EscapeType.UNICODE);
>> 
>>    Result:
>> 
>>    abc\u2022def
>>    ghi
>> 
>

Re: Raw String Literal Library Support

Reply via email to