Re: RFR: 8346118: Improve whitespace normalization in preformatted text [v2]

Hannes Wallnöfer Wed, 05 Mar 2025 21:32:50 -0800

On Thu, 6 Mar 2025 05:18:26 GMT, Hannes Wallnöfer <hann...@openjdk.org> wrote:


>> Please review an enhancement to make `DocCommentParser` normalize whitespace 
>> inside `<pre>` elements. The normalization is conceptually simple and and 
>> intended to be minimally invasive. Before parsing, `DocCommentParser` checks 
>> whether the text is a traditional doc comment and whether every line starts 
>> with a space character, which is commonly the case in traditional doc 
>> comments. If so, a single leading space is removed in block content (top 
>> level text and `{@code}`/`{@literal}` tags) when parsing within HTML `<pre>` 
>> tags.
>> 
>> This fixes the incidental one-space indentation in the vast majority of JDK 
>> code samples using `<pre>` alone or in combination with `<code>` or 
>> `{@code}`. In fact, I only found one code sample in JDK code that isn't 
>> solved by this change, for which I included a fix in this PR (it's in 
>> `String.startsWith(String, int)`, where I replaced the 10 char indentation 
>> and trailing line with a `<blockquote>`). 
>> 
>> The many added `boolean inBlockContent` arguments pased around in 
>> `DocCommentParser` are to make sure the removal is not applied to multiline 
>> inline content, which is maybe a bit fussy considering there is not a lot of 
>> multiline inline content in `<pre>` tags and it usually would not mind about 
>> removal of a non-essential space character, but I wanted to keep the change 
>> minimal. There are few javadoc tests that had to be adapted, most of the 
>> testing is done in `test/langtools/tools/javac/doctree`. 
>> 
>> If the exact number of leading whitespace in `<pre>` tags is important to 
>> any javadoc user the old output can be restored by increasing the 
>> indentation by 1. There will be a release note for this of course. 
>> 
>> Unfortunately, there is another whitespace problem that can't be solved as 
>> easily, and that is a leading blank line caused by `<pre><code>\n` open 
>> tags. Browsers will [ignore a newline immediately following a `<pre>` 
>> tag][1], but not if there is a `<code>` tag in between. There are hundreds 
>> of occurrences of this in JDK code, including variants with space characters 
>> mixed in. The fix in javadoc proper would be too complex, so I decided to 
>> solve it with 3 lines of JavaScript and a regex to reverse the order of 
>> `<code>\n` at the beginning of `<pre>` tags while removing any intermediary 
>> space. Script operation is indiscernible and it solves the problem.
>> 
>> [1]: https://html.spec.whatwg.org/#the-pre-element:the-pre-element
>
> Hannes Wallnöfer has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove script for normalizing leading space in pre/code based on review 
> feedback

I removed the script to strip leading whitespace based on your feedback. 

Regarding the added complexity this patch adds to `DocCommentParser` (which I 
dislike as you do): I think we could reconsider exclusion of multiline inline 
content. The test examples I came up with are very contrived (multiline `title` 
attribute and `{@index}` tag inside `<pre>`, both of which would not mind 
removal of redundant space after line breaks). I initially tried using a 
`{@snippet}` tag inside `<pre>` which is also completely unrealistic, but 
`{@snippet}` is not even affected as it does its own parsing.

Without that exclusion, the patch is mostly just two simple methods, and all 
the added complexity to `content` and other existing methods is gone.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23868#issuecomment-2702853308

Re: RFR: 8346118: Improve whitespace normalization in preformatted text [v2]

Reply via email to