> On 14 May 2020, at 20:46, Jonathan Gibbons <[email protected]>
> wrote:
>
> OK, but I'm still wondering why it is better to use a block tag for
> `@summary` compared to an inline tag
Nested markup.
> , like this:
>
> /**
> * {@summary First sentence and the summary of this doc comment.}
> *
> * Second sentence. Third sentence. As you can see, there are no other
> * block tags in that doc comment.
> *
> *
> */
> public void f()
>
> In other words, the inline version does what we want; why bother to
> introduce a variant form that is likely open to (accidental?) misuse.
>
> If we start going down the road of a style-checker in @doclint, or maybe even
> without that, we could impose a rule that the `{@summary}` tag must be the
> first non-whitespace content.
>
> -- Jon
>
> On 5/14/20 11:44 AM, Pavel Rappo wrote:
>>> On 14 May 2020, at 15:56, Jonathan Gibbons <[email protected]>
>>> wrote:
>>>
>>> Similar, but maybe a different flavor. The point is that there is no way to
>>> terminate content of a block tag except by starting a new block tag. So
>>> "unterminated content" seems a bit of a misnomer here.
>>>
>>> I guess maybe I might have misunderstood part of your idea ... you may have
>>> been suggesting that the @summary block tag should be put at the end of the
>>> body, along with other block tags, leading to this weird example:
>>>
>>> /**
>>> * Second sentence. Third sentence. As you can see, there are no other
>>> * block tags in that doc comment.
>>> *
>>> * @summary First sentence and the summary of this doc comment.
>>> *
>>> */
>>> public void f()
>> That's exactly what I meant and I should've been more clear about it. I knew
>> about that behaviour of block tags, described in "Documentation Comment
>> Specification for the Standard Doclet", hence ellipsis before @summary in my
>> first example and then, the introductory note before the second example.
>>
>> Now back to your example. Sure, when you put it like that (i.e. "First
>> sentence." preceded by "Second sentence.") it looks weird. However, in some
>> cases, the benefits of having a better structure could perhaps outweigh a
>> possibly surprising order. After all, we can vary the order of other block
>> tags and no one seems to be complaining that @param and @return are in the
>> order opposite to that of their source code counterparts.
>>
>> We could have a @body tag, but that's a step too far and I'm not suggesting
>> that. The world these days seems to be going the opposite direction anyway,
>> from markup to markdown.
>>
>> -Pavel
>>
>>> On 5/13/20 2:02 PM, Pavel Rappo wrote:
>>>>> On 13 May 2020, at 21:59, Jonathan Gibbons <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Pavel,
>>>>>
>>>>> You can't put block tags before the main body text. Put another way,
>>>>> each block tag consumes all input that follows up to the next block tag.
>>>>> So, while we could (now) make @summary a bimodal tag, it definitely would
>>>>> NOT work the way you are expecting.
>>>> Is it different from what I mentioned right after "On the other hand, I
>>>> can imagine inadvertently introducing another sort of errors, due to
>>>> unterminated content..."? (Just want to understand if I get that right.)
>>>> Thanks.
>>>>
>>>>> -- Jon
>>>>>
>>>>> On 5/13/20 1:49 PM, Pavel Rappo wrote:
>>>>>> Jon, here's an idea to ponder. A spin-off of the issue in question. What
>>>>>> if we could mitigate the shortcomings of the {@summary} tag by allowing
>>>>>> it to be a block tag too? I mean can we make it bimodal?
>>>>>>
>>>>>> /**
>>>>>> ...
>>>>>> *
>>>>>> * @summary Returns
>>>>>> sqrt(<i>x</i><sup>2</sup> +<i>y</i><sup>2</sup>)
>>>>>> * without intermediate overflow or underflow.
>>>>>> *
>>>>>> ...
>>>>>> * @since 1.5
>>>>>> */
>>>>>> public static double hypot(double x, double y)
>>>>>>
>>>>>> If we do that, it could make @summary a complete solution for any case
>>>>>> in the *new* code, no matter how twisted that case is. Authors would get
>>>>>> a better tool for structuring doc comments, an ability to use whatever
>>>>>> the markup or the formatting they want in a summary section, and
>>>>>> accurate and predictable parsing. I guess it would've been considered
>>>>>> for JDK-8173425, have we had bimodal tags back then.
>>>>>>
>>>>>> On the other hand, I can imagine inadvertently introducing another sort
>>>>>> of errors, due to unterminated contents:
>>>>>>
>>>>>> /**
>>>>>> * @summary First sentence and the summary of this doc comment.
>>>>>> *
>>>>>> * Second sentence. Third sentence. As you can see, there are no
>>>>>> other
>>>>>> * block tags in that doc comment.
>>>>>> */
>>>>>> public void f()
>>>>>>
>>>>>> -Pavel
>>>>>>
>>>>>>> On 13 May 2020, at 20:01, Jonathan Gibbons
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/13/20 11:41 AM, Pavel Rappo wrote:
>>>>>>>> Thanks for chiming in, Roger.
>>>>>>>>
>>>>>>>>> On 13 May 2020, at 18:30, Roger Riggs <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The first sentence is not just any old sentence.
>>>>>>>>> It has a very specific role to play in the javadoc both to introduce
>>>>>>>>> the class, method, feild, etc.
>>>>>>>>> AND to stand independently when used in a summary.
>>>>>>>>> That places a responsibility on the author to craft the sentence for
>>>>>>>>> those purposes.
>>>>>>>>> The author should review their work in the generated javadoc, the
>>>>>>>>> summary tables, etc.
>>>>>>>>> before feeling satisified and moving on.
>>>>>>>>> IMHO the first sentence should be short and to the point and not
>>>>>>>>> include markup or
>>>>>>>>> extra explainatory phrases (such as e.g.).
>>>>>>>> 1. Just to be clear. Does this fall into the "SHOULD" or the "MUST"
>>>>>>>> category? If the latter, then this MUST be specified. Probably
>>>>>>>> differently that what we have today in the Documentation Comment
>>>>>>>> Specification for the Standard Doclet [^1]:
>>>>>>> SHOULD, not MUST.
>>>>>>>>> The first sentence of the initial description should be a summary
>>>>>>>>> sentence that contains a concise but complete description of the
>>>>>>>>> declared entity. Descriptive text may include HTML tags and entities,
>>>>>>>>> and inline tags as described below.
>>>>>>>> If this is the former, then we need more guidance. Perhaps plenty of
>>>>>>>> examples, including DOs and DON'Ts, as summarizing a complete doc
>>>>>>>> comment into a single sentence can be challenging. Especially if we
>>>>>>>> disallow markup, restrict formatting, and disapprove familiar tools,
>>>>>>>> such as abbreviations, which are freely used in written language.
>>>>>>>>
>>>>>>>> Come to think of it, if it is that important then we should think of
>>>>>>>> teaching doclint (or some other tool) to check that.
>>>>>>> Maybe. doclint was primarily about detecting issues that lead to bad
>>>>>>> files being generated, and less about the style of the content. That's
>>>>>>> not to say we can't change/update the focus, but IMO style is better
>>>>>>> addressed with human processes like reviews and CSR.
>>>>>>>> 2. We should think about what to do with doc comments not following
>>>>>>>> those rules (conventions?) in the OpenJDK codebase.
>>>>>>>>
>>>>>>>>> I don't think the tools should try to be as understanding as
>>>>>>>>> the reader or to compensate for the shortcomings of the author.
>>>>>>>> Neither do I and I believe I made my position clear in that text.
>>>>>>>>
>>>>>>>> -Pavel
>>>>>>>>
>>>>>>>> [^1]:
>>>>>>>> https://docs.oracle.com/en/java/javase/14/docs/specs/javadoc/doc-comment-spec.html
>>>>>>>>
>>>>>>>>> $.02, Roger
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5/13/20 12:20 PM, Jonathan Gibbons wrote:
>>>>>>>>>> Pavel,
>>>>>>>>>>
>>>>>>>>>> Good write up. You should link to this from 8232447.
>>>>>>>>>>
>>>>>>>>>> -- Jon
>>>>>>>>>>
>>>>>>>>>> On 5/13/20 7:44 AM, Pavel Rappo wrote:
>>>>>>>>>>> The issue:
>>>>>>>>>>>
>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8232447
>>>>>>>>>>>
>>>>>>>>>>> The more I think about this issue, the less I feel like solving it.
>>>>>>>>>>> On the one hand, that problem is more complicated than it looks. On
>>>>>>>>>>> the other hand, solving that problem doesn’t seem to be that
>>>>>>>>>>> important since it’s about making our best-effort to improve
>>>>>>>>>>> presentation. I'm leaning towards a solution that is good-enough
>>>>>>>>>>> (possibly, the one that we already have) or reconsidering the
>>>>>>>>>>> problem altogether.
>>>>>>>>>>>
>>>>>>>>>>> Here's what the problem is about. JavaDoc extracts summaries from
>>>>>>>>>>> doc comments to place them on documentation pages to assist quick
>>>>>>>>>>> scans by humans (think Table of Contents with descriptive
>>>>>>>>>>> headings). Since JavaDoc does not understand the meaning of doc
>>>>>>>>>>> comments, to extract a summary it relies on a convention [^0] that
>>>>>>>>>>> the first sentence of a doc comment is that doc comment's summary.
>>>>>>>>>>> The problem is that sometimes JavaDoc gets that first sentence
>>>>>>>>>>> wrong. For example, according to JavaDoc, the first sentence of
>>>>>>>>>>> this doc comment for `GraphicsEnvironment.preferProportionalFonts`
>>>>>>>>>>> [^1]
>>>>>>>>>>>
>>>>>>>>>>>> Indicates a preference for proportional over non-proportional
>>>>>>>>>>>> (e.g. dual-spaced CJK fonts) fonts in the mapping of logical fonts
>>>>>>>>>>>> to physical fonts. If the default mapping contains fonts for which
>>>>>>>>>>>> proportional and non-proportional variants exist, then calling
>>>>>>>>>>>> this method indicates the mapping should use a proportional
>>>>>>>>>>>> variant.
>>>>>>>>>>> is
>>>>>>>>>>>
>>>>>>>>>>>> Indicates a preference for proportional over non-proportional (e.g.
>>>>>>>>>>> Now, why does this happen? Unless a more sophisticated mechanism is
>>>>>>>>>>> requested or the locale's language is not English, JavaDoc uses a
>>>>>>>>>>> simple "dot-space" algorithm to detect a sentence boundary. That
>>>>>>>>>>> algorithm scans input from left to right looking for the dot
>>>>>>>>>>> character followed by a whitespace. While it looks reasonable, in
>>>>>>>>>>> the above case it is clearly inadequate.
>>>>>>>>>>>
>>>>>>>>>>> At this point, the reader might say: "Pfft. I know how to fix
>>>>>>>>>>> this." Please bear with me and I'll show you that the problem is
>>>>>>>>>>> actually multilayered. Not only does it include a sentence
>>>>>>>>>>> segmentation algorithm [^2], but input that the algorithm is fed
>>>>>>>>>>> with, as well as structure and quality of doc comments the input is
>>>>>>>>>>> created from.
>>>>>>>>>>>
>>>>>>>>>>> Instead of jumping head-first into augmenting the "dot-space"
>>>>>>>>>>> algorithm with more heuristics, let's try one more thing. If
>>>>>>>>>>> instructed to do so or the locale's language is not English,
>>>>>>>>>>> JavaDoc uses `BreakIterator` [^3]. That `java.text` mechanism is
>>>>>>>>>>> specifically designed to find various boundaries in text. When
>>>>>>>>>>> `BreakIterator` is turned on (and after additional tweaking),
>>>>>>>>>>> JavaDoc gets that first sentence about "proportional fonts" right,
>>>>>>>>>>> however, other issues show up. Consider the following comment for
>>>>>>>>>>> `FocusTraversalPolicy.getComponentAfter` [^4]:
>>>>>>>>>>>
>>>>>>>>>>>> Returns the Component that should receive the focus after
>>>>>>>>>>>> aComponent. aContainer must be a focus cycle root of aComponent or
>>>>>>>>>>>> a focus traversal policy provider.
>>>>>>>>>>> Here `BreakIterator` thinks that the whole paragraph is a single
>>>>>>>>>>> sentence. This is because in English sentences begin with capital
>>>>>>>>>>> letters. I should pause here. This is an important moment. While
>>>>>>>>>>> some doc comments may indeed have typos, irregularities, or quality
>>>>>>>>>>> issues, that doc comment about "aComponent" has none of those. It's
>>>>>>>>>>> genuine and consists of easily recognizable by humans a couple of
>>>>>>>>>>> sentences that do not, however, strictly abide by the rules of
>>>>>>>>>>> English Grammar. To me, this (and other experiments with
>>>>>>>>>>> `BreakIterator` I've done) shows that doc comments are not your
>>>>>>>>>>> regular prose. Unsurprisingly, even a specialized text tool doesn't
>>>>>>>>>>> grok it. (Which makes me wonder if that was one of the reasons why
>>>>>>>>>>> `BreakIterator` is turned off by default.) Add indentation and
>>>>>>>>>>> markup on top of that and you'll see why the ultimate form that
>>>>>>>>>>> JavaDoc has to work with is not a string but something like this:
>>>>>>>>>>>
>>>>>>>>>>> list size = 10
>>>>>>>>>>> 0 = {DCTree$DCStartElement} "<code>"
>>>>>>>>>>> 1 = {DCTree$DCText} "DOMLocator"
>>>>>>>>>>> 2 = {DCTree$DCEndElement} "</code>"
>>>>>>>>>>> 3 = {DCTree$DCText} " is an interface that describes a
>>>>>>>>>>> location (e.g.\n where an error occurred).\n "
>>>>>>>>>>> 4 = {DCTree$DCStartElement} "<p>"
>>>>>>>>>>> 5 = {DCTree$DCText} "See also the "
>>>>>>>>>>> 6 = {DCTree$DCStartElement} "<a
>>>>>>>>>>> href='http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407'>"
>>>>>>>>>>> 7 = {DCTree$DCText} "Document Object Model (DOM) Level 3 Core
>>>>>>>>>>> Specification"
>>>>>>>>>>> 8 = {DCTree$DCEndElement} "</a>"
>>>>>>>>>>> 9 = {DCTree$DCText} "."
>>>>>>>>>>>
>>>>>>>>>>> Continuous text we see on a documentation page [^5] in a browser
>>>>>>>>>>> comes from a representation such as the above, where the text can
>>>>>>>>>>> be scattered across various AST nodes. This has interesting
>>>>>>>>>>> implications. Consider the following doc comment (note the
>>>>>>>>>>> whitespace after `comment.`):
>>>>>>>>>>>
>>>>>>>>>>> /** This is the first sentence of this <i>comment. </i> This
>>>>>>>>>>> is the second sentence. */
>>>>>>>>>>>
>>>>>>>>>>> Both simple "dot-space" algorithm and `BreakIterator` fail to
>>>>>>>>>>> extract the first sentence here, producing the exact same result
>>>>>>>>>>> consisting of both sentences. When `.` is moved immediately after
>>>>>>>>>>> the closing `</i>`, they both extract the first sentence correctly.
>>>>>>>>>>> However, the HTML output breaks (note the absence of closing
>>>>>>>>>>> `</i>`):
>>>>>>>>>>>
>>>>>>>>>>> <div class="block">This is the first sentence of this
>>>>>>>>>>> <i>comment.</div>
>>>>>>>>>>>
>>>>>>>>>>> This is partly because JavaDoc does not interpret HTML. Instead, it
>>>>>>>>>>> uses a hybrid approach that applies a sentence segmentation
>>>>>>>>>>> algorithm as an auxiliary step to individual text nodes (not
>>>>>>>>>>> necessarily the whole text) while maintaining awareness of the
>>>>>>>>>>> surrounding nodes. The fact that nodes preserve indentation and
>>>>>>>>>>> formatting of the original doc comment makes things worse, as
>>>>>>>>>>> whitespace is significant in sentence segmentation. No wonder
>>>>>>>>>>> JavaDoc hardly sees the forest for the syntax trees! Perhaps, a
>>>>>>>>>>> more careful way of doing that would be as follows:
>>>>>>>>>>>
>>>>>>>>>>> 1. Interpret markup as text.
>>>>>>>>>>> 2. Apply sentence segmentation to that text to find the first
>>>>>>>>>>> sentence.
>>>>>>>>>>> 3. Map that first sentence back to markup to accurately extract
>>>>>>>>>>> the corresponding portion.
>>>>>>>>>>>
>>>>>>>>>>> But even that won't magically solve all the issues as it's not
>>>>>>>>>>> possible to decompose an arbitrary markup into independent
>>>>>>>>>>> components. Consider the following doc comment:
>>>>>>>>>>>
>>>>>>>>>>> /**
>>>>>>>>>>> * <table class="comment">
>>>>>>>>>>> * <tr>
>>>>>>>>>>> * <td><i>Is this the first sentence?</i></td>
>>>>>>>>>>> * <td>Is this the second sentence?</td>
>>>>>>>>>>> * </tr>
>>>>>>>>>>> * <tr>...</tr>
>>>>>>>>>>> * </table>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> Even if we find that "first sentence", can we safely extract it
>>>>>>>>>>> from its table-context? And all this is just the structure layer of
>>>>>>>>>>> the problem.
>>>>>>>>>>>
>>>>>>>>>>> Next layer is ambiguities. Unless extreme measures are taken those
>>>>>>>>>>> are only resolvable by a human, sometimes by an expert in the area
>>>>>>>>>>> the documentation relates to. Using abbreviations such as "etc.",
>>>>>>>>>>> "e.g.", "i.e.", and "vs." is part of the issue. Early guides [^6]
>>>>>>>>>>> on JavaDoc advised against using abbreviations. While I can see now
>>>>>>>>>>> one of the reasons for this advice, people use them anyway. Some
>>>>>>>>>>> might say that abbreviations can be more succinct and practical.
>>>>>>>>>>> For instance, "etc." is shorter than "and so on", "and so forth",
>>>>>>>>>>> or "and so on and so forth", and even pronounced literally as "et
>>>>>>>>>>> cetera" in speech. Non-standard grammar in abbreviations aggravates
>>>>>>>>>>> the issue. For instance, is "ie" a misspelt "i.e.", an initialism
>>>>>>>>>>> of Internet Explorer, or a top-level domain name of The Republic of
>>>>>>>>>>> Ireland? Or is "etc" is a misspelt "etc." or rather that `/etc`
>>>>>>>>>>> directory from the UNIX Filesystem Hierarchy Standard? (When
>>>>>>>>>>> scanning OpenJDK repo for occurrences of "etc." in comments, I
>>>>>>>>>>> found that it can be written with the number of dots anywhere from
>>>>>>>>>>> 0 to 4. The latter could be explained as ellipsis `...` followed by
>>>>>>>>>>> a dot `.`, faulty keyboard, or perhaps a muscle twitch.)
>>>>>>>>>>>
>>>>>>>>>>> The final layer is typos and low-quality comments. What proportion
>>>>>>>>>>> of doc comment follow that convention about the first sentence?
>>>>>>>>>>> What proportion of comments respect grammar or have a meaningful
>>>>>>>>>>> structure? While we shouldn't aim for a solution that rights the
>>>>>>>>>>> wrongs of bad comments (i.e. Garbage In, Garbage Out), this is
>>>>>>>>>>> something to keep in mind:
>>>>>>>>>>>
>>>>>>>>>>> /**
>>>>>>>>>>> * this function draws the border around each tab
>>>>>>>>>>> * note that this function does now draw the background of the
>>>>>>>>>>> tab.
>>>>>>>>>>> * that is done elsewhere
>>>>>>>>>>> ...
>>>>>>>>>>> */
>>>>>>>>>>> protected void paintTabBorder(Graphics g, int tabPlacement,
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> There are things we can do to remediate that problem on the doc
>>>>>>>>>>> comments side of the equation. Reasonable conventions that are
>>>>>>>>>>> adhered to, better structure of doc comments, or hints. For
>>>>>>>>>>> example, placing a newline or more than a single whitespace after
>>>>>>>>>>> the first sentence. Or indicating the summary part of a doc comment
>>>>>>>>>>> with a relatively new `{@summary}` tag. That said, all of those
>>>>>>>>>>> might have problems of their own. They are intrusive and require to
>>>>>>>>>>> re-document the existing code, which is not always possible. In
>>>>>>>>>>> addition to that, `{@summary}` cannot contain nested markup, which
>>>>>>>>>>> is quite often used in the summary part. For example
>>>>>>>>>>>
>>>>>>>>>>> /**
>>>>>>>>>>> * Returns the runtime class of this {@code Object}. The
>>>>>>>>>>> returned
>>>>>>>>>>> * {@code Class} object is the object that is locked by {@code
>>>>>>>>>>> * static synchronized} methods of the represented class.
>>>>>>>>>>> ...
>>>>>>>>>>> */
>>>>>>>>>>> public final native Class<?> getClass();
>>>>>>>>>>> or
>>>>>>>>>>>
>>>>>>>>>>> /**
>>>>>>>>>>> * An ordered collection (also known as a <i>sequence</i>).
>>>>>>>>>>> ...
>>>>>>>>>>> */
>>>>>>>>>>> public interface List<E> extends Collection<E> { ...
>>>>>>>>>>> Whatever a solution we choose, there's a risk of playing a
>>>>>>>>>>> whac-a-mole game. Maybe we should aim for a solution that is
>>>>>>>>>>> good-enough (possibly, the one that we already have) or reconsider
>>>>>>>>>>> the problem altogether. For instance, do not extract the first
>>>>>>>>>>> sentence (unless it can be done reliably). Instead, get the first N
>>>>>>>>>>> characters and indicate continuation (e.g. using ellipsis `...`),
>>>>>>>>>>> or use the complete doc-comment, whichever is shorter.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To sum up, extracting sentences from a text written in a natural
>>>>>>>>>>> language is anything but trivial and might require human judgement.
>>>>>>>>>>> When done programmatically, occasional mistakes are inevitable. Doc
>>>>>>>>>>> comments are barely text. While they have some structure, they also
>>>>>>>>>>> use formatting, code, and markup. Hence, without pre-processing
>>>>>>>>>>> text tools might not be applicable. Though JavaDoc could improve
>>>>>>>>>>> its algorithms and doc comments could be more friendly, what we
>>>>>>>>>>> have today works surprisingly well on the OpenJDK codebase. If this
>>>>>>>>>>> is not enough, we could find another way of extracting a summary or
>>>>>>>>>>> eliminate the need for it completely. That is, change the
>>>>>>>>>>> presentation in such a way that it won't require summaries.
>>>>>>>>>>>
>>>>>>>>>>> -Pavel
>>>>>>>>>>>
>>>>>>>>>>> [^0]:
>>>>>>>>>>> https://www.oracle.com/technical-resources/articles/java/javadoc-tool.html#format
>>>>>>>>>>> [^1]:
>>>>>>>>>>> https://docs.oracle.com/en/java/javase/14/docs/api/java.desktop/java/awt/GraphicsEnvironment.html#preferProportionalFonts()
>>>>>>>>>>> [^2]: https://en.wikipedia.org/wiki/Sentence_boundary_disambiguation
>>>>>>>>>>> [^3]:
>>>>>>>>>>> https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/text/BreakIterator.html
>>>>>>>>>>> [^4]:
>>>>>>>>>>> https://docs.oracle.com/en/java/javase/14/docs/api/java.desktop/java/awt/FocusTraversalPolicy.html#getComponentAfter(java.awt.Container,java.awt.Component)
>>>>>>>>>>> [^5]:
>>>>>>>>>>> https://docs.oracle.com/en/java/javase/14/docs/api/java.xml/org/w3c/dom/DOMLocator.html
>>>>>>>>>>> [^6]:
>>>>>>>>>>> https://www.oracle.com/technical-resources/articles/java/javadoc-tool.html#styleguide
>>>>>>>>>>>
>