Re: [cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Douglas Gregor Mon, 06 Aug 2012 23:05:55 -0700

On Aug 6, 2012, at 4:22 PM, Dmitri Gribenko <[email protected]> wrote:


> On Mon, Aug 6, 2012 at 8:13 AM, Douglas Gregor <[email protected]> wrote:
>> On Aug 3, 2012, at 1:34 PM, Dmitri Gribenko <[email protected]> wrote:
>> Index: test/Index/comment-xml-schema.c
>> ===================================================================
>> --- test/Index/comment-xml-schema.c     (revision 0)
>> +++ test/Index/comment-xml-schema.c     (revision 0)
>> @@ -0,0 +1,35 @@
>> +// RUN: xmllint --noout --relaxng 
>> %S/Inputs/CommentXML/comment-xml-schema.rng 
>> %S/Inputs/CommentXML/valid-function-01.xml
>> 
>> This is going to fail if 'xmllint' is unavailable or simply not in the path. 
>> I think you'll need to extend lit with some knowledge of whether 'xmllint' 
>> is available (and perhaps its path?), so that we only run this set of tests 
>> when xmllint is available (via "REQUIRES: xmllint").
> 
> Seems to be doable with a small change to lit.cfg for clang tests:
> 
> Index: test/lit.cfg
> ===================================================================
> --- test/lit.cfg        (revision 161322)
> +++ test/lit.cfg        (working copy)
> @@ -259,3 +259,7 @@
> 
> if llc_props['enable_assertions']:
>     config.available_features.add('asserts')
> +
> +if lit.util.which('xmllint'):
> +    config.available_features.add('xmllint')
> +

Great!

>> I think the comment XML format looks great; it's a good balance between 
>> simplicity for the client and capability for rendering documentation 
>> comments. Some nitpicks follow:
>> 
>> +  <start>
>> +    <choice>
>> +      <ref name="Function" />
>> +      <ref name="Class" />
>> +      <ref name="Variable" />
>> +      <ref name="Namespace" />
>> +      <ref name="Typedef" />
>> +    </choice>
>> +  </start>
>> 
>> I understand that the set of choices is intentionally limited, to simplify 
>> the output, but it looks like we're missing some necessary categories: 
>> enumeration types, (Objective-C) protocols, (Objective-C) categories and 
>> (Objective-C) properties, at a minimum. Should there still be an "other" 
>> category here?
> 
> Yes, I will add 'Other' for now, but at least 'Enum' should be added
> separately.  I think Protocols and categories can be handled as
> 'Class'es, properties -- as 'Variables'.  Of course, unless we want to
> store additional information for these.

Hrm, okay. We may need to separate out protocols later, because they are in a 
different namespace than (and, this this XML format, indistinguishable from) 
Objective-C classes.

>> I assume that parameters (both template and function) will always be 
>> documented as part of their class/function/etc?
> 
> Yes, since \param and \tparam are in the same FullComment.

So, how does a client deal with this? Do they have to look at the cursor kind 
to decide whether to ask for a comment from elsewhere? Can we make it simpler?

>> Similarly, enumerators will be documented as part of their enumeration type?
> 
> No, since enums are usually documented this way:
> /// Blah
> enum A {
>   A1, ///< Aaa
>   A2, ///< Bbb
> };
> So comments for enumerators end up in separate FullComments.

Will these be classified as "variables", then, for the XML representation?

>> What should happen if one asks for the XML representation of, e.g., a 
>> parameter's comment?
> 
> For ParmVarDecl, ASTContext::getRawCommentForDeclNoCache() will fail
> even to find a RawComment.

Okay. I guess that ties in with the question of "how should the client handle 
this?"

>> I assume that "Function" elements need something for "throws" or 
>> "exceptions", but that's not at all important now.
> 
> Maybe we will pretty-print it as a part of declaration?  Added a TODO
> note for now.

It turns out that exceptions documented in Doxygen can have actual 
documentation paragraphs associated with them, so those won't be pretty-printed 
as part of the declaration.

>> +  <define name="Class">
>> +    <element name="Class">
>> +      <optional>
>> +        <attribute name="isTemplate">
>> +          <data type="boolean" />
>> +        </attribute>
>> +      </optional>
>> +      <optional>
>> +        <attribute name="isTemplatePartialSpecialization">
>> +          <data type="boolean" />
>> +        </attribute>
>> +      </optional>
>> +      <optional>
>> +        <attribute name="isTemplateSpecialization">
>> +          <data type="boolean" />
>> +        </attribute>
>> +      </optional>
>> 
>> isTemplatePartialSpecialization == isTemplate && isTemplateSpecialization?
> 
> Even more, I would say that it is a bit confusing.  A set of boolean
> flags is usually not a good choice.  I changed that part of schema:
> 
>  <define name="Class">
>    <element name="Class">
>      <optional>
>        <attribute name="templateKind">
>          <choice>
>            <value>template</value>
>            <value>specialization</value>
>            <value>partialSpecialization</value>
>          </choice>
>        </attribute>
>      </optional>

Yes, I like this much better!

>> +  <define name="Name">
>> +    <element name="Name">
>> +      <!-- Non-empty text content. -->
>> +      <data type="string">
>> +        <param name="pattern">.*\S.*</param>
>> +      </data>
>> +    </element>
>> +  </define>
>> 
>> Presumably, the name is the unqualified name . One could imagine that it 
>> would be useful to have (in separate elements!) the qualifier of the name 
>> (e.g., the "std::" in "std::sort") and the template arguments of a 
>> specialization, so clients could render the full name if they want it. But, 
>> that's just grave.
> 
> Could we leave this to a follow-up?

Yes, absolutely.

>> More importantly,  it would be very useful to have the USR as a separate 
>> (optional) element, because USRs allow indexers based on libclang to 
>> identify entities across translation units.
> 
> Done, added <USR>...</USR> after <Name>...</Name>.

Thanks!

>> Similarly, for each kind of entity, could we add "location" information, 
>> e.g., the file, line, and column at which the declaration occurs?
> 
> Added "line" and "column" attributes to the root tag.

Could we get the full pathname as well?

> For example, for
>>>> 
> /// Aaa.
> int aaa;
> <<<
> we generate:
> 
> <Variable line="2"
> column="5"><Name>aaa</Name><USR>c:@aaa</USR><Abstract><Para>
> Aaa.</Para></Abstract></Variable>
> 
>> +void CommentASTToXMLConverter::visitFullComment(const FullComment *C) {
>> +  FullCommentParts Parts(C);
>> +
>> +  const DeclInfo *DI = C->getDeclInfo();
>> // …
>> +    bool FoundName = false;
>> +    if (const NamedDecl *ND = dyn_cast<NamedDecl>(DI->ThisDecl)) {
>> +      if (DeclarationName DeclName = ND->getDeclName()) {
>> +        Result << "<Name>";
>> +        std::string Name = DeclName.getAsString();
>> +        appendToResultWithXMLEscaping(Name);
>> +        FoundName = true;
>> +        Result << "</Name>";
>> +      }
>> 
>> Two comments here… first, this won't work for something like
>> 
>>  typedef struct { … } Foo;
>> 
>> where we almost surely want to use the name 'Foo'.
> 
> If we bound the comment to the struct decl, we have the RecordDecl*
> and I don't see how to get a TypedefDecl* from there.  To support
> this, we should be binding the comment to both.

Ah, I understand.

>> Second, for an Objective-C category, e.g.,
>> 
>> @interface NSString (NSCPlusPlusParser)
>> @endif
>> 
>> we probably want the name to be the full "NSString (NSCPlusPlusParser)".
> 
> Can we leave this to a follow-up?

Yes, of course.

>> +    if (!FoundName)
>> +      Result << "<Name>unknown</Name>";
>> 
>> How about "<anonymous>" or simply the empty string "" rather than "unknown"? 
>> Anonymous entities aren't that rare in C/C++.
> 
> Done.

Thanks, patch LGTM!

        - Doug
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Reply via email to