On Tue, Mar 13, 2018 at 2:47 PM, Jim Laskey <james.las...@oracle.com> wrote: > With the announcement of JEP 326 Raw String Literals, we would like to open > up a discussion with regards to RSL library support. Below are several > implemented String methods that are believed to be appropriate. Please > comment on those mentioned below including recommending alternate names or > signatures. Additional methods can be considered if warranted, but as always, > the bar for inclusion in String is high. > > You should keep a couple things in mind when reviewing these methods. > > Methods should be applicable to all strings, not just Raw String Literals. > > The number of additional methods should be minimized, not adding every > possible method. > > Don't put any emphasis on performance. That is a separate discussion. > > Cheers, > > -- Jim > > A. Line support. > > public Stream<String> lines() > Returns a stream of substrings extracted from this string partitioned by line > terminators. Internally, the stream is implemented using a Spliteratorthat > extracts one line at a time. The line terminators recognized are \n, \r\n and > \r. This method provides versatility for the developer working with > multi-line strings.
So "lines()" will support any mix of "\n", "\r\n" and "\r" inside a single string as line terminator? Will "\n", "\r\n" and "\r" be parsed from left to right with one character look-ahead? I.e. \n = 1 newline \n\r = 2 newlines (i.e. an empty line) \n\r\n = 2 newlines (i.e. an empty line) because "\r\n" counts as a single new line \n\r\n\r = 3 newlines (i.e. two empty lines) Would it make sense to have a versions of "lines(LINE_TERM lt)" which take a single, concrete form of line terminator? > Example: > > String string = "abc\ndef\nghi"; > Stream<String> stream = string.lines(); > List<String> list = stream.collect(Collectors.toList()); > > Result: > > [abc, def, ghi] > > > Example: > > String string = "abc\ndef\nghi"; > String[] array = string.lines().toArray(String[]::new); > > Result: > > [Ljava.lang.String;@33e5ccce // [abc, def, ghi] > > > Example: > > String string = "abc\ndef\r\nghi\rjkl"; > String platformString = > string.lines().collect(joining(System.lineSeparator())); > > Result: > > abc > def > ghi > jkl > > > Example: > > String string = " abc \n def \n ghi "; > String trimmedString = > string.lines().map(s -> s.trim()).collect(joining("\n")); > > Result: > > abc > def > ghi > > > Example: > > String table = `First Name Surname Phone > Al Albert 555-1111 > Bob Roberts 555-2222 > Cal Calvin 555-3333 > `; > > // Extract headers > String firstLine = table.lines().findFirst().orElse(""); > List<String> headings = List.of(firstLine.trim().split(`\s{2,}`)); > > // Build stream of maps > Stream<Map<String, String>> stream = > table.lines().skip(1) > .map(line -> line.trim()) > .filter(line -> !line.isEmpty()) > .map(line -> line.split(`\s{2,}`)) > .map(columns -> { > List<String> values = List.of(columns); > return IntStream.range(0, headings.size()).boxed() > .collect(toMap(headings::get, > values::get)); > }); > > // print all "First Name" > stream.map(row -> row.get("First Name")) > .forEach(name -> System.out.println(name)); > > Result: > > Al > Bob > Cal > B. Additions to basic trim methods. In addition to margin methods trimIndent > and trimMarkers described below in Section C, it would be worth introducing > trimLeft and trimRight to augment the longstanding trim method. A key > question is how trimLeft and trimRight should detect whitespace, because > different definitions of whitespace exist in the library. > > trim itself uses the simple test less than or equal to the space character, a > fast test but not Unicode friendly. > > Character.isWhitespace(codepoint) returns true if codepoint one of the > following; > > SPACE_SEPARATOR. > LINE_SEPARATOR. > PARAGRAPH_SEPARATOR. > '\t', U+0009 HORIZONTAL TABULATION. > '\n', U+000A LINE FEED. > '\u000B', U+000B VERTICAL TABULATION. > '\f', U+000C FORM FEED. > '\r', U+000D CARRIAGE RETURN. > '\u001C', U+001C FILE SEPARATOR. > '\u001D', U+001D GROUP SEPARATOR. > '\u001E', U+001E RECORD SEPARATOR. > '\u001F', U+001F UNIT SEPARATOR. > ' ', U+0020 SPACE. > (Note: that non-breaking space (\u00A0) is excluded) > > Character.isSpaceChar(codepoint) returns true if codepoint one of the > following; > > SPACE_SEPARATOR. > LINE_SEPARATOR. > PARAGRAPH_SEPARATOR. > ' ', U+0020 SPACE. > '\u00A0', U+00A0 NON-BREAKING SPACE. > That sets up several kinds of whitespace; trim's whitespace (TWS), Character > whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a > slow test. UWS is fast for Latin1 and slow-ish for UTF-16. > > We are recommending that trimLeft and trimRight use UWS, leave trim alone to > avoid breaking the world and then possibly introduce trimWhitespace that uses > UWS. > > public String trim() > Removes characters less than equal to space from the beginning and end of the > string. No, change except spec clarification and links to the new trim > methods. > Examples: > "".trim(); // "" > " ".trim(); // "" > " abc ".trim(); // "abc" > " \u2028abc ".trim(); // "\u2028abc" > public String trimWhitespace() > Removes whitespace from the beginning and end of the string. > Examples: > > "".trimWhitespace(); // "" > " ".trimWhitespace(); // "" > " abc ".trimWhitespace(); // "abc" > " \u2028abc ".trimWhitespace(); // "abc" > public String trimLeft() > Removes whitespace from the beginning of the string. > Examples: > > "".trimLeft(); // "" > " ".trimLeft(); // "" > " abc ".trimLeft(); // "abc " > public String trimRight() > Removes whitespace from the end of the string. > Examples: > > "".trimRight(); // "" > " ".trimRight(); // "" > " abc ".trimRight(); // " abc" > C. Margin management. With introduction of multi-line Raw String Literals, > developers will have to deal with the extraneous spacing introduced by > indenting and formatting string bodies. > > Note that for all the methods in this group, if the first line is empty then > it is removed and if the last is empty then it is removed. This removal > provides a means for developers that use delimiters on separate lines to > bracket string bodies. Also note, that all line separators are replaced with > \n. > > public String trimIndent() > This method determines a representative line in the string body that has a > non-whitespace character closest to the left margin. Once that line has been > determined, the number of leading whitespaces is tallied to produce a minimal > indent amount. Consequently, the result of the method is a string with the > minimal indent amount removed from each line. The first line is unaffected > since it is preceded by the open delimiter. The type of whitespace used > (spaces or tabs) does not affect the result as long as the developer is > consistent with the whitespace used. > Example: > > String x = ` > This is a line > This is a line > This is a line > This is a line > This is a line > `.trimIndent(); > > Result: > > This is a line > This is a line > This is a line > This is a line > This is a line > public String trimMarkers(String leftMarker, String rightMarker) > Each line of the multi-line string is first trimmed. If the trimmed line > contains the leftMarker at the beginning of the string then it is removed. > Finally, if the line contains the rightMarker at the end of line, it is > removed. > Example: > > String x = `|This is a line| > |This is a line| > |This is a line|`.trimMarkers("|", "|"); > Result: > > This is a line > This is a line > This is a line > > Example: > > String x = `>> This is a line > >> This is a line > >> This is a line`.trimMarkers(">> ", ""); > Result: > > This is a line > This is a line > This is a line > D. Escape management. Since Raw String Literals do not interpret Unicode > escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a > scheme for developers who just want multi-line strings but still have escape > sequences interpreted. > > public String unescape() throws MalformedEscapeException > Translates each Unicode escape or escape sequence in the string into the > character represented by the escape. @jls 3.3, 3.10.6 > Example: > > `abc\u2022def\nghi`.unescape(); > > Result: > > abc•def > ghi > public String unescape(EscapeType... escape) throws MalformedEscapeException > Selectively translates Unicode escape or escape sequence based on the escape > type flags provided. > public enum EscapeType { > /** Backslash escape sequences based on section 3.10.6 of the > * <cite>The Java™ Language Specification</cite>. > * This includes sequences for backspace, horizontal tab, > * line feed, form feed, carriage return, double quote, > * single quote, backslash and octal escape sequences. > */ > BACKSLASH, // > > /** Unicode sequences based on section 3.3 of the > * <cite>The Java™ Language Specification</cite>. > * This includes sequences in the form {@code \u005Cunnnn}. > */ > UNICODE > } > > > Example: > > `abc\u2022def\nghi`.unescape(EscapeType.BACKSLASH); > > Result: > > abc\u2022def > ghi > > > Example: > > `abc\u2022def\nghi`.unescape(EscapeType.UNICODE); > > Result: > > abc•def\nghi > Conversely, there are circumstances where the inverse is required > > public String escape() > Translates each quote, backslash, non-graphic character or non-ASCII > character into an Unicode escape or escape sequence. The method is equivalent > to escape(BACKSLASH, UNICODE) . > Example: > > `abc•def > ghi`.escape(); > > Result: > > abc\u2022def\nghi > public String escape(EscapeType... escape) > Selectively translates each quote, backslash, non-graphic character or > non-ASCII character into an Unicode escape or escape sequence based on the > escape type flags provided. > Example: > > `abc•def > ghi`.escape(EscapeType.BACKSLASH); > > Result: > > abc•def\nghi > > > Example: > > `abc•def > ghi`.escape(EscapeType.UNICODE); > > Result: > > abc\u2022def > ghi >