On Mon, 3 Mar 2025 16:41:18 GMT, Hannes Wallnöfer <hann...@openjdk.org> wrote:
> Please review an enhancement to make `DocCommentParser` normalize whitespace > inside `<pre>` elements. The normalization is conceptually simple and and > intended to be minimally invasive. Before parsing, `DocCommentParser` checks > whether the text is a traditional doc comment and whether every line starts > with a space character, which is commonly the case in traditional doc > comments. If so, a single leading space is removed in block content (top > level text and `{@code}`/`{@literal}` tags) when parsing within HTML `<pre>` > tags. > > This fixes the incidental one-space indentation in the vast majority of JDK > code samples using `<pre>` alone or in combination with `<code>` or > `{@code}`. In fact, I only found one code sample in JDK code that isn't > solved by this change, for which I included a fix in this PR (it's in > `String.startsWith(String, int)`, where I replaced the 10 char indentation > and trailing line with a `<blockquote>`). > > The many added `boolean inBlockContent` arguments pased around in > `DocCommentParser` are to make sure the removal is not applied to multiline > inline content, which is maybe a bit fussy considering there is not a lot of > multiline inline content in `<pre>` tags and it usually would not mind about > removal of a non-essential space character, but I wanted to keep the change > minimal. There are few javadoc tests that had to be adapted, most of the > testing is done in `test/langtools/tools/javac/doctree`. > > If the exact number of leading whitespace in `<pre>` tags is important to any > javadoc user the old output can be restored by increasing the indentation by > 1. There will be a release note for this of course. > > Unfortunately, there is another whitespace problem that can't be solved as > easily, and that is a leading blank line caused by `<pre><code>\n` open tags. > Browsers will [ignore a newline immediately following a `<pre>` tag][1], but > not if there is a `<code>` tag in between. There are hundreds of occurrences > of this in JDK code, including variants with space characters mixed in. The > fix in javadoc proper would be too complex, so I decided to solve it with 3 > lines of JavaScript and a regex to reverse the order of `<code>\n` at the > beginning of `<pre>` tags while removing any intermediary space. Script > operation is indiscernible and it solves the problem. > > [1]: https://html.spec.whatwg.org/#the-pre-element:the-pre-element Thanks for reviewing this, Jon. I agree there are two issues that maybe shold be handled separately. Regarding the first issue, stripping leading space in `DocCommentParser`, I did in fact start out processing the whole doc comment. I was pretty far into fixing tests when I realized that this approach breaks our ability to get a diagnostic position from a string extracted from the comment. The source postion mapping is generated in `JavadocScanner`, and it would be quite complex to keep track of whitespace removed after the fact and correctly translate to a source position. The test that made me realize this problem was [TestDocTreeDiags.java], which tests reporting positions inside strings extracted from the comment (added in [JDK-8268420]) , but with wholesale whitespace removal also start positions are affected. I should have mentioned this in the PR description, but it was already long enough as it was. [TestDocTreeDiags.java]: https://github.com/openjdk/jdk/blob/master/test/langtools/jdk/javadoc/doclet/testDocTreeDiags/TestDocTreeDiags.java [JDK-8268420]: https://bugs.openjdk.org/browse/JDK-8268420 This is why I reverted to the proposed minimalistic solution, limiting space removal to text and `{@code}`/`{@literal}` tags within `<pre>` elements. These are all doctrees that are passed though without further processing, so it is highly unlikely that anybody would ever want to report a source position inside of them. The begin postions will be correct since we're parsing the unchanged doc comment, but positions inside the string after the first line break would report a wrong position. This is why I filed [a CSR to amend the spec for above mentioned method][CSR] with a note about this. [CSR]: https://bugs.openjdk.org/browse/JDK-8350428 Regarding the second issue, I agree the JavaScript solution is not ideal. But it's a small price to pay in order to get readable and good looking documentation. We have close to 1000 instances of `<pre>{@code}` in JDK source code alone, and they will never get fixed and continue to haunt our otherwise great documentation. The same problem affects most Java libraries out there. You see I'm quite passionate about this, but I agree to move it to a separate issue if you think that's the best way to handle this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23868#issuecomment-2702033467