Re: RFR: 8346118: Improve whitespace normalization in preformatted text [v3]

Hannes Wallnöfer Thu, 06 Mar 2025 09:06:23 -0800

> Please review an enhancement to make `DocCommentParser` normalize whitespace 
> inside `<pre>` elements. The normalization is conceptually simple and and 
> intended to be minimally invasive. Before parsing, `DocCommentParser` checks 
> whether the text is a traditional doc comment and whether every line starts 
> with a space character, which is commonly the case in traditional doc 
> comments. If so, a single leading space is removed in block content (top 
> level text and `{@code}`/`{@literal}` tags) when parsing within HTML `<pre>` 
> tags.
> 
> This fixes the incidental one-space indentation in the vast majority of JDK 
> code samples using `<pre>` alone or in combination with `<code>` or 
> `{@code}`. In fact, I only found one code sample in JDK code that isn't 
> solved by this change, for which I included a fix in this PR (it's in 
> `String.startsWith(String, int)`, where I replaced the 10 char indentation 
> and trailing line with a `<blockquote>`). 
> 
> The many added `boolean inBlockContent` arguments pased around in 
> `DocCommentParser` are to make sure the removal is not applied to multiline 
> inline content, which is maybe a bit fussy considering there is not a lot of 
> multiline inline content in `<pre>` tags and it usually would not mind about 
> removal of a non-essential space character, but I wanted to keep the change 
> minimal. There are few javadoc tests that had to be adapted, most of the 
> testing is done in `test/langtools/tools/javac/doctree`. 
> 
> If the exact number of leading whitespace in `<pre>` tags is important to any 
> javadoc user the old output can be restored by increasing the indentation by 
> 1. There will be a release note for this of course. 
> 
> Unfortunately, there is another whitespace problem that can't be solved as 
> easily, and that is a leading blank line caused by `<pre><code>\n` open tags. 
> Browsers will [ignore a newline immediately following a `<pre>` tag][1], but 
> not if there is a `<code>` tag in between. There are hundreds of occurrences 
> of this in JDK code, including variants with space characters mixed in. The 
> fix in javadoc proper would be too complex, so I decided to solve it with 3 
> lines of JavaScript and a regex to reverse the order of `<code>\n` at the 
> beginning of `<pre>` tags while removing any intermediary space. Script 
> operation is indiscernible and it solves the problem.
> 
> [1]: https://html.spec.whatwg.org/#the-pre-element:the-pre-element


Hannes Wallnöfer has updated the pull request incrementally with one additional 
commit since the last revision:

  Switch to post-processing approach that fixes both problems
  
  This implements a visitor-based post-processing method in
  DocCommentParser that is able to fix both kinds of whitespace
  common in traditional doc comments. Tests have to be adapted
  to the additional normalization step. A few more tests should
  be added on the javadoc side.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/23868/files
  - new: https://git.openjdk.org/jdk/pull/23868/files/d2128b3a..8a9036d6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=23868&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23868&range=01-02

  Stats: 304 lines in 4 files changed: 200 ins; 63 del; 41 mod
  Patch: https://git.openjdk.org/jdk/pull/23868.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/23868/head:pull/23868

PR: https://git.openjdk.org/jdk/pull/23868

Re: RFR: 8346118: Improve whitespace normalization in preformatted text [v3]

Reply via email to