RFR: JDK-8236048: Cleanup use of Utils.normalizeNewlines

Jonathan Gibbons Fri, 29 Jul 2022 17:44:10 -0700

Please review a medium-size cleanup related to handling newlines in 
`Content`/`HtmlTree`/`Text[Builder]` trees.
This is the 3rd and last in the recent series of cleanup tasks in the code to 
generate HTML.


Until recently, the `Text`/`TextBuilder` objects were "ready-for-writing", 
meaning that characters like `<`, `&`, etc were encoded as entities, and 
newlines were represented by the platform line separator. This caused 
difficulties when counting the size of generated text, since these characters 
had to be detected and accounted for.  A recent change addressed entities, but 
stayed clear of the newline issue, which is now addressed here.

A related performance problem is that the method `Utils.normalizeNewlines` 
always created new strings, even when it was not necessary.

With this change, newlines in `Text`, `TextBuilder` and other subtypes of 
`Content` containing text are always represented by `\n`, and are converted to 
the platform line separator as needed. 

I note the following assumptions:

* the input may contain any flavor of newlines in the user-provided 
documentation comments
* the output should aways use the platform line separator (`\n` on Linux and 
macOS, `\r\n` on Windows etc.)

Together, these imply that some amount of the input will need to be scanned to 
normalize newlines at some point in the process. While it would be nice to not 
have to normalize newlines, it's a case or "pay now, or pay later". 

Notable parts of this change:

* `Utils.normalizeNewlines` is moved to be a static method in `Text` and 
rewritten to not create new strings unless needed. This allows a number of 
questionable imports of `toolkit.utils.DocletConstants` to be removed from the 
`html.markup` package.  While it would be possible to call this method 
automatically on all strings passed to `Text`, `TextBuilder` etc, in many cases 
it is not necessary: put another way, it only needs to be called on strings 
that have come from the user, either in documentation comments or in 
command-line options.
* `Content.write`, and the implementations thereof, are given a new parameter 
which gives the newline sequence to use. This is set to `\n` for `toString` and 
the platform separator when writing to a file.
* `DocletConstants.NL` goes away and is superseded by:
   *  `Text.NL`: always `\n`, to be used when adding text to a `Content` tree 
(this replaces most uses of `DocletConstants.NL`)
   * `DocFile.PLATFORM_LINE_SEPARATOR`: the platform line separator, to be used 
when writing to files
* The various `charCount` methods no longer have to worry about use of the 
platform line separator 
* Assertions are used to verify that there are no `\r` characters added into 
`Text`/`TextBuilder`/`RawHtml`/`Content`

Other cleanup:
* `DocletConstants.DEFAULT_TAB_STOP_LENGTH` is moved to `BaseOptions` which is 
the only class that uses it. This leaves just two remaining constants in 
`DocletConstants` which should probably be moved elsewhere as well, and the 
`DocletConstants` class deleted. That is a step too far for the work here.
* Minor IDE-suggestions, for lambdas, enhanced switch, adding braces, etc

One test was affected by the `DocletConstants.NL` change, but apart from that, 
this is a pure-cleanup change, with no (intentional) externally visible effect.

-------------

Commit messages:
 - JDK-8236048: Cleanup use of Utils.normalizeNewlines

Changes: https://git.openjdk.org/jdk/pull/9691/files
 Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=9691&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8236048
  Stats: 247 lines in 26 files changed: 78 ins; 76 del; 93 mod
  Patch: https://git.openjdk.org/jdk/pull/9691.diff
  Fetch: git fetch https://git.openjdk.org/jdk pull/9691/head:pull/9691

PR: https://git.openjdk.org/jdk/pull/9691

RFR: JDK-8236048: Cleanup use of Utils.normalizeNewlines

Reply via email to