One more change set. trimIndent -> stripIndent trimMarkers -> stripMarkers
> On Mar 20, 2018, at 10:35 AM, Jim Laskey <james.las...@oracle.com> wrote: > > Summary. > > A. Line support. > > - Supporting a mix of line terminators `\n|\r\n|\r` is already a well > established pattern in language parsers, in the JDK (ex. see > java.nio.file.FileChannelLinesSpliterator) and RegEx (ex. see `\R`). The > performance difference between checking one vs the three is negligible. > > - Yes, Stream<String> stream = > Pattern.compile("\n|\r\n|\r").splitAsStream(string); is very useful > (Spliterators rule), but is cumbersome in this expected to be common use > case. Only so-so streamy. :-) > > - BufferedRead.lines() vs. String.lines() is a tricky discussion. It comes > down to whether the new line is a terminator or a separator. In the i/o > case, it seems terminator is the right answer. A well formed text file will > have a new line at the end of every line. However, I think you’ll find when > people work with multi-line strings they think of new line as a separator. > Hence, the common use of split(“\n”) and “”.split(“\n”).length == 1. > Indentation, the position of closing delimiter and margin trimming makes that > last line very fluid. > > What clinches the deal is that > string.lines().collect(joining(“\n”)).equals(string). I’ll ensure both > versions of lines() have the difference well javadocumented. > > - The current Spliterator implementation makes > String.lines().toArray(String[]::new) an order of magnitude faster than > split(`\n|\r\n|\r`). That’s why I implemented it for margin management. > Faster still if no collection/array is constructed. > > BTW: split(`\R`) is 2x-3x faster than split(`\n|\r\n|\r`). Nice. > > B. Additions to basic trim methods. > > - Revamped to become strip, stripLeading, stripTrailing using > Character.isWhiteSpace(codepoint) as the test (optimized using ch == ‘ ' || > ch == ‘\t’ || Character.isWhiteSpace(ch)). > > - No strong feeling about it, but String.trim() could be recommended for > deprecation. > > C. Margin management. > > - String.trimMarkers() as a default to String.trimMarkers(“|”, “|”) is > reasonable. Will put it in the CSR for broader discussion. > > - Re use of patterns. I think the Stream<String> lines() method will make it > very easy enough to create custom trim margin lambdas. > > D. Escape management. > > - Good > > Cheers, > > — Jim > > > > >> On Mar 13, 2018, at 10:47 AM, Jim Laskey <james.las...@oracle.com> wrote: >> >> With the announcement of JEP 326 Raw String Literals, we would like to open >> up a discussion with regards to RSL library support. Below are several >> implemented String methods that are believed to be appropriate. Please >> comment on those mentioned below including recommending alternate names or >> signatures. Additional methods can be considered if warranted, but as >> always, the bar for inclusion in String is high. >> >> You should keep a couple things in mind when reviewing these methods. >> >> Methods should be applicable to all strings, not just Raw String Literals. >> >> The number of additional methods should be minimized, not adding every >> possible method. >> >> Don't put any emphasis on performance. That is a separate discussion. >> >> Cheers, >> >> -- Jim >> >> A. Line support. >> >> public Stream<String> lines() >> Returns a stream of substrings extracted from this string partitioned by >> line terminators. Internally, the stream is implemented using a >> Spliteratorthat extracts one line at a time. The line terminators recognized >> are \n, \r\n and \r. This method provides versatility for the developer >> working with multi-line strings. >> Example: >> >> String string = "abc\ndef\nghi"; >> Stream<String> stream = string.lines(); >> List<String> list = stream.collect(Collectors.toList()); >> >> Result: >> >> [abc, def, ghi] >> >> >> Example: >> >> String string = "abc\ndef\nghi"; >> String[] array = string.lines().toArray(String[]::new); >> >> Result: >> >> [Ljava.lang.String;@33e5ccce // [abc, def, ghi] >> >> >> Example: >> >> String string = "abc\ndef\r\nghi\rjkl"; >> String platformString = >> string.lines().collect(joining(System.lineSeparator())); >> >> Result: >> >> abc >> def >> ghi >> jkl >> >> >> Example: >> >> String string = " abc \n def \n ghi "; >> String trimmedString = >> string.lines().map(s -> s.trim()).collect(joining("\n")); >> >> Result: >> >> abc >> def >> ghi >> >> >> Example: >> >> String table = `First Name Surname Phone >> Al Albert 555-1111 >> Bob Roberts 555-2222 >> Cal Calvin 555-3333 >> `; >> >> // Extract headers >> String firstLine = table.lines().findFirst().orElse(""); >> List<String> headings = List.of(firstLine.trim().split(`\s{2,}`)); >> >> // Build stream of maps >> Stream<Map<String, String>> stream = >> table.lines().skip(1) >> .map(line -> line.trim()) >> .filter(line -> !line.isEmpty()) >> .map(line -> line.split(`\s{2,}`)) >> .map(columns -> { >> List<String> values = List.of(columns); >> return IntStream.range(0, headings.size()).boxed() >> .collect(toMap(headings::get, >> values::get)); >> }); >> >> // print all "First Name" >> stream.map(row -> row.get("First Name")) >> .forEach(name -> System.out.println(name)); >> >> Result: >> >> Al >> Bob >> Cal >> B. Additions to basic trim methods. In addition to margin methods trimIndent >> and trimMarkers described below in Section C, it would be worth introducing >> trimLeft and trimRight to augment the longstanding trim method. A key >> question is how trimLeft and trimRight should detect whitespace, because >> different definitions of whitespace exist in the library. >> >> trim itself uses the simple test less than or equal to the space character, >> a fast test but not Unicode friendly. >> >> Character.isWhitespace(codepoint) returns true if codepoint one of the >> following; >> >> SPACE_SEPARATOR. >> LINE_SEPARATOR. >> PARAGRAPH_SEPARATOR. >> '\t', U+0009 HORIZONTAL TABULATION. >> '\n', U+000A LINE FEED. >> '\u000B', U+000B VERTICAL TABULATION. >> '\f', U+000C FORM FEED. >> '\r', U+000D CARRIAGE RETURN. >> '\u001C', U+001C FILE SEPARATOR. >> '\u001D', U+001D GROUP SEPARATOR. >> '\u001E', U+001E RECORD SEPARATOR. >> '\u001F', U+001F UNIT SEPARATOR. >> ' ', U+0020 SPACE. >> (Note: that non-breaking space (\u00A0) is excluded) >> >> Character.isSpaceChar(codepoint) returns true if codepoint one of the >> following; >> >> SPACE_SEPARATOR. >> LINE_SEPARATOR. >> PARAGRAPH_SEPARATOR. >> ' ', U+0020 SPACE. >> '\u00A0', U+00A0 NON-BREAKING SPACE. >> That sets up several kinds of whitespace; trim's whitespace (TWS), Character >> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is >> a slow test. UWS is fast for Latin1 and slow-ish for UTF-16. >> >> We are recommending that trimLeft and trimRight use UWS, leave trim alone to >> avoid breaking the world and then possibly introduce trimWhitespace that >> uses UWS. >> >> public String trim() >> Removes characters less than equal to space from the beginning and end of >> the string. No, change except spec clarification and links to the new trim >> methods. >> Examples: >> "".trim(); // "" >> " ".trim(); // "" >> " abc ".trim(); // "abc" >> " \u2028abc ".trim(); // "\u2028abc" >> public String trimWhitespace() >> Removes whitespace from the beginning and end of the string. >> Examples: >> >> "".trimWhitespace(); // "" >> " ".trimWhitespace(); // "" >> " abc ".trimWhitespace(); // "abc" >> " \u2028abc ".trimWhitespace(); // "abc" >> public String trimLeft() >> Removes whitespace from the beginning of the string. >> Examples: >> >> "".trimLeft(); // "" >> " ".trimLeft(); // "" >> " abc ".trimLeft(); // "abc " >> public String trimRight() >> Removes whitespace from the end of the string. >> Examples: >> >> "".trimRight(); // "" >> " ".trimRight(); // "" >> " abc ".trimRight(); // " abc" >> C. Margin management. With introduction of multi-line Raw String Literals, >> developers will have to deal with the extraneous spacing introduced by >> indenting and formatting string bodies. >> >> Note that for all the methods in this group, if the first line is empty then >> it is removed and if the last is empty then it is removed. This removal >> provides a means for developers that use delimiters on separate lines to >> bracket string bodies. Also note, that all line separators are replaced with >> \n. >> >> public String trimIndent() >> This method determines a representative line in the string body that has a >> non-whitespace character closest to the left margin. Once that line has been >> determined, the number of leading whitespaces is tallied to produce a >> minimal indent amount. Consequently, the result of the method is a string >> with the minimal indent amount removed from each line. The first line is >> unaffected since it is preceded by the open delimiter. The type of >> whitespace used (spaces or tabs) does not affect the result as long as the >> developer is consistent with the whitespace used. >> Example: >> >> String x = ` >> This is a line >> This is a line >> This is a line >> This is a line >> This is a line >> `.trimIndent(); >> >> Result: >> >> This is a line >> This is a line >> This is a line >> This is a line >> This is a line >> public String trimMarkers(String leftMarker, String rightMarker) >> Each line of the multi-line string is first trimmed. If the trimmed line >> contains the leftMarker at the beginning of the string then it is removed. >> Finally, if the line contains the rightMarker at the end of line, it is >> removed. >> Example: >> >> String x = `|This is a line| >> |This is a line| >> |This is a line|`.trimMarkers("|", "|"); >> Result: >> >> This is a line >> This is a line >> This is a line >> >> Example: >> >> String x = `>> This is a line >>>> This is a line >>>> This is a line`.trimMarkers(">> ", ""); >> Result: >> >> This is a line >> This is a line >> This is a line >> D. Escape management. Since Raw String Literals do not interpret Unicode >> escapes (\unnnn) or escape sequences (\n, \b, etc), we need to provide a >> scheme for developers who just want multi-line strings but still have escape >> sequences interpreted. >> >> public String unescape() throws MalformedEscapeException >> Translates each Unicode escape or escape sequence in the string into the >> character represented by the escape. @jls 3.3, 3.10.6 >> Example: >> >> `abc\u2022def\nghi`.unescape(); >> >> Result: >> >> abc•def >> ghi >> public String unescape(EscapeType... escape) throws MalformedEscapeException >> Selectively translates Unicode escape or escape sequence based on the escape >> type flags provided. >> public enum EscapeType { >> /** Backslash escape sequences based on section 3.10.6 of the >> * <cite>The Java™ Language Specification</cite>. >> * This includes sequences for backspace, horizontal tab, >> * line feed, form feed, carriage return, double quote, >> * single quote, backslash and octal escape sequences. >> */ >> BACKSLASH, // >> >> /** Unicode sequences based on section 3.3 of the >> * <cite>The Java™ Language Specification</cite>. >> * This includes sequences in the form {@code \u005Cunnnn}. >> */ >> UNICODE >> } >> >> >> Example: >> >> `abc\u2022def\nghi`.unescape(EscapeType.BACKSLASH); >> >> Result: >> >> abc\u2022def >> ghi >> >> >> Example: >> >> `abc\u2022def\nghi`.unescape(EscapeType.UNICODE); >> >> Result: >> >> abc•def\nghi >> Conversely, there are circumstances where the inverse is required >> >> public String escape() >> Translates each quote, backslash, non-graphic character or non-ASCII >> character into an Unicode escape or escape sequence. The method is >> equivalent to escape(BACKSLASH, UNICODE) . >> Example: >> >> `abc•def >> ghi`.escape(); >> >> Result: >> >> abc\u2022def\nghi >> public String escape(EscapeType... escape) >> Selectively translates each quote, backslash, non-graphic character or >> non-ASCII character into an Unicode escape or escape sequence based on the >> escape type flags provided. >> Example: >> >> `abc•def >> ghi`.escape(EscapeType.BACKSLASH); >> >> Result: >> >> abc•def\nghi >> >> >> Example: >> >> `abc•def >> ghi`.escape(EscapeType.UNICODE); >> >> Result: >> >> abc\u2022def >> ghi >> >