Re: Raw String Literal Library Support

Volker Simonis Tue, 13 Mar 2018 11:31:28 -0700

On Tue, Mar 13, 2018 at 2:47 PM, Jim Laskey <james.las...@oracle.com> wrote:
> With the announcement of JEP 326 Raw String Literals, we would like to open 
> up a discussion with regards to RSL library support. Below are several 
> implemented String methods that are believed to be appropriate. Please 
> comment on those mentioned below including recommending alternate names or 
> signatures. Additional methods can be considered if warranted, but as always, 
> the bar for inclusion in String is high.
>
> You should keep a couple things in mind when reviewing these methods.
>
> Methods should be applicable to all strings, not just Raw String Literals.
>
> The number of additional methods should be minimized, not adding every 
> possible method.
>
> Don't put any emphasis on performance. That is a separate discussion.
>
> Cheers,
>
> -- Jim
>
> A. Line support.
>
> public Stream<String> lines()
> Returns a stream of substrings extracted from this string partitioned by line 
> terminators. Internally, the stream is implemented using a Spliteratorthat 
> extracts one line at a time. The line terminators recognized are \n, \r\n and 
> \r. This method provides versatility for the developer working with 
> multi-line strings.


So "lines()" will support any mix of  "\n", "\r\n" and "\r" inside a
single string as line terminator?

Will "\n", "\r\n" and "\r" be parsed from left to right with one
character look-ahead? I.e.
\n = 1 newline
\n\r = 2 newlines (i.e. an empty line)
\n\r\n = 2 newlines (i.e. an empty line) because "\r\n" counts as a
single new line
\n\r\n\r = 3 newlines (i.e. two empty lines)

Would it make sense to have a versions of "lines(LINE_TERM lt)" which
take a single, concrete form of line terminator?

>      Example:
>
>         String string = "abc\ndef\nghi";
>         Stream<String> stream = string.lines();
>         List<String> list = stream.collect(Collectors.toList());
>
>      Result:
>
>      [abc, def, ghi]
>
>
>      Example:
>
>         String string = "abc\ndef\nghi";
>         String[] array = string.lines().toArray(String[]::new);
>
>      Result:
>
>      [Ljava.lang.String;@33e5ccce // [abc, def, ghi]
>
>
>      Example:
>
>         String string = "abc\ndef\r\nghi\rjkl";
>         String platformString =
>             string.lines().collect(joining(System.lineSeparator()));
>
>      Result:
>
>      abc
>      def
>      ghi
>      jkl
>
>
>      Example:
>
>         String string = " abc  \n   def  \n ghi   ";
>         String trimmedString =
>              string.lines().map(s -> s.trim()).collect(joining("\n"));
>
>      Result:
>
>      abc
>      def
>      ghi
>
>
>      Example:
>
>         String table = `First Name      Surname        Phone
>                         Al              Albert         555-1111
>                         Bob             Roberts        555-2222
>                         Cal             Calvin         555-3333
>                        `;
>
>         // Extract headers
>         String firstLine = table.lines().findFirst().orElse("");
>         List<String> headings = List.of(firstLine.trim().split(`\s{2,}`));
>
>         // Build stream of maps
>         Stream<Map<String, String>> stream =
>             table.lines().skip(1)
>                  .map(line -> line.trim())
>                  .filter(line -> !line.isEmpty())
>                  .map(line -> line.split(`\s{2,}`))
>                  .map(columns -> {
>                      List<String> values = List.of(columns);
>                      return IntStream.range(0, headings.size()).boxed()
>                                      .collect(toMap(headings::get, 
> values::get));
>                  });
>
>         // print all "First Name"
>         stream.map(row -> row.get("First Name"))
>               .forEach(name -> System.out.println(name));
>
>      Result:
>
>      Al
>      Bob
>      Cal
> B. Additions to basic trim methods. In addition to margin methods trimIndent 
> and trimMarkers described below in Section C, it would be worth introducing 
> trimLeft and trimRight to augment the longstanding trim method. A key 
> question is how trimLeft and trimRight should detect whitespace, because 
> different definitions of whitespace exist in the library.
>
> trim itself uses the simple test less than or equal to the space character, a 
> fast test but not Unicode friendly.
>
> Character.isWhitespace(codepoint) returns true if codepoint one of the 
> following;
>
>    SPACE_SEPARATOR.
>    LINE_SEPARATOR.
>    PARAGRAPH_SEPARATOR.
>    '\t',     U+0009 HORIZONTAL TABULATION.
>    '\n',     U+000A LINE FEED.
>    '\u000B', U+000B VERTICAL TABULATION.
>    '\f',     U+000C FORM FEED.
>    '\r',     U+000D CARRIAGE RETURN.
>    '\u001C', U+001C FILE SEPARATOR.
>    '\u001D', U+001D GROUP SEPARATOR.
>    '\u001E', U+001E RECORD SEPARATOR.
>    '\u001F', U+001F UNIT SEPARATOR.
>    ' ',      U+0020 SPACE.
> (Note: that non-breaking space (\u00A0) is excluded)
>
> Character.isSpaceChar(codepoint) returns true if codepoint one of the 
> following;
>
>    SPACE_SEPARATOR.
>    LINE_SEPARATOR.
>    PARAGRAPH_SEPARATOR.
>    ' ',      U+0020 SPACE.
>    '\u00A0', U+00A0 NON-BREAKING SPACE.
> That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a 
> slow test. UWS is fast for Latin1 and slow-ish for UTF-16.
>
> We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
> avoid breaking the world and then possibly introduce trimWhitespace that uses 
> UWS.
>
> public String trim()
> Removes characters less than equal to space from the beginning and end of the 
> string. No, change except spec clarification and links to the new trim 
> methods.
>     Examples:
>         "".trim();              // ""
>         "   ".trim();           // ""
>         "  abc  ".trim();       // "abc"
>         "  \u2028abc  ".trim(); // "\u2028abc"
> public String trimWhitespace()
> Removes whitespace from the beginning and end of the string.
>      Examples:
>
>         "".trimWhitespace();              // ""
>         "   ".trimWhitespace();           // ""
>         "  abc  ".trimWhitespace();       // "abc"
>         "  \u2028abc  ".trimWhitespace(); // "abc"
> public String trimLeft()
> Removes whitespace from the beginning of the string.
>      Examples:
>
>         "".trimLeft();        // ""
>         "   ".trimLeft();     // ""
>         "  abc  ".trimLeft(); // "abc  "
> public String trimRight()
> Removes whitespace from the end of the string.
>      Examples:
>
>         "".trimRight();        // ""
>         "   ".trimRight();     // ""
>         "  abc  ".trimRight(); // "  abc"
> C. Margin management. With introduction of multi-line Raw String Literals, 
> developers will have to deal with the extraneous spacing introduced by 
> indenting and formatting string bodies.
>
> Note that for all the methods in this group, if the first line is empty then 
> it is removed and if the last is empty then it is removed. This removal 
> provides a means for developers that use delimiters on separate lines to 
> bracket string bodies. Also note, that all line separators are replaced with 
> \n.
>
> public String trimIndent()
> This method determines a representative line in the string body that has a 
> non-whitespace character closest to the left margin. Once that line has been 
> determined, the number of leading whitespaces is tallied to produce a minimal 
> indent amount. Consequently, the result of the method is a string with the 
> minimal indent amount removed from each line. The first line is unaffected 
> since it is preceded by the open delimiter. The type of whitespace used 
> (spaces or tabs) does not affect the result as long as the developer is 
> consistent with the whitespace used.
>      Example:
>
>         String x = `
>                    This is a line
>                       This is a line
>                           This is a line
>                       This is a line
>                    This is a line
>                    `.trimIndent();
>
>      Result:
>
>      This is a line
>          This is a line
>              This is a line
>          This is a line
>      This is a line
> public String trimMarkers(String leftMarker, String rightMarker)
> Each line of the multi-line string is first trimmed. If the trimmed line 
> contains the leftMarker at the beginning of the string then it is removed. 
> Finally, if the line contains the rightMarker at the end of line, it is 
> removed.
>      Example:
>
>          String x = `|This is a line|
>                      |This is a line|
>                      |This is a line|`.trimMarkers("|", "|");
>      Result:
>
>      This is a line
>      This is a line
>      This is a line
>
>      Example:
>
>          String x = `>> This is a line
>                      >> This is a line
>                      >> This is a line`.trimMarkers(">> ", "");
>      Result:
>
>      This is a line
>      This is a line
>      This is a line
> D. Escape management. Since Raw String Literals do not interpret Unicode 
> escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a 
> scheme for developers who just want multi-line strings but still have escape 
> sequences interpreted.
>
> public String unescape() throws MalformedEscapeException
> Translates each Unicode escape or escape sequence in the string into the 
> character represented by the escape. @jls 3.3, 3.10.6
>      Example:
>
>          `abc\u2022def\nghi`.unescape();
>
>      Result:
>
>      abc•def
>      ghi
> public String unescape(EscapeType... escape) throws MalformedEscapeException
> Selectively translates Unicode escape or escape sequence based on the escape 
> type flags provided.
>        public enum EscapeType {
>             /** Backslash escape sequences based on section 3.10.6 of the
>              * <cite>The Java&trade; Language Specification</cite>.
>              * This includes sequences for backspace, horizontal tab,
>              * line feed, form feed, carriage return, double quote,
>              * single quote, backslash and octal escape sequences.
>              */
>             BACKSLASH, //
>
>             /** Unicode sequences based on section 3.3 of the
>              * <cite>The Java&trade; Language Specification</cite>.
>              * This includes sequences in the form {@code \u005Cunnnn}.
>              */
>             UNICODE
>         }
>
>
>      Example:
>
>          `abc\u2022def\nghi`.unescape(EscapeType.BACKSLASH);
>
>      Result:
>
>      abc\u2022def
>      ghi
>
>
>      Example:
>
>          `abc\u2022def\nghi`.unescape(EscapeType.UNICODE);
>
>      Result:
>
>      abc•def\nghi
> Conversely, there are circumstances where the inverse is required
>
> public String escape()
> Translates each quote, backslash, non-graphic character or non-ASCII 
> character into an Unicode escape or escape sequence. The method is equivalent 
> to escape(BACKSLASH, UNICODE) .
>      Example:
>
>          `abc•def
>          ghi`.escape();
>
>      Result:
>
>      abc\u2022def\nghi
> public String escape(EscapeType... escape)
> Selectively translates each quote, backslash, non-graphic character or 
> non-ASCII character into an Unicode escape or escape sequence based on the 
> escape type flags provided.
>      Example:
>
>          `abc•def
>          ghi`.escape(EscapeType.BACKSLASH);
>
>      Result:
>
>      abc•def\nghi
>
>
>      Example:
>
>          `abc•def
>          ghi`.escape(EscapeType.UNICODE);
>
>      Result:
>
>      abc\u2022def
>      ghi
>

Re: Raw String Literal Library Support

Reply via email to