Re: [cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Douglas Gregor Mon, 06 Aug 2012 08:17:04 -0700

On Aug 3, 2012, at 1:34 PM, Dmitri Gribenko <[email protected]> wrote:


> Hello,
> 
> The attached patch implements a libclang API for comment-to-xml
> conversion.  The implementation also includes a Relax NG schema and
> tests for the schema itself.  The schema is used in c-index-test to
> verify that XML documents we produce are valid.  In order to do the
> validation, we add an optional libxml2 dependency for c-index-test.

Very cool. Comments below.

> I am not sure that test/Index/Inputs/CommentXML is the best place to
> put the XML schema.  Any suggestions?

bindings/xml, perhaps?

> Current implementation of declaration name printer is not perfect --
> it prints 'operator _Bool' for C++ 'operator bool()', for example.
> But this can be fixed in a followup together with a complete
> declaration printer.


Index: test/Index/comment-xml-schema.c
===================================================================
--- test/Index/comment-xml-schema.c     (revision 0)
+++ test/Index/comment-xml-schema.c     (revision 0)
@@ -0,0 +1,35 @@
+// RUN: xmllint --noout --relaxng %S/Inputs/CommentXML/comment-xml-schema.rng 
%S/Inputs/CommentXML/valid-function-01.xml

This is going to fail if 'xmllint' is unavailable or simply not in the path. I 
think you'll need to extend lit with some knowledge of whether 'xmllint' is 
available (and perhaps its path?), so that we only run this set of tests when 
xmllint is available (via "REQUIRES: xmllint").

I think the comment XML format looks great; it's a good balance between 
simplicity for the client and capability for rendering documentation comments. 
Some nitpicks follow:

+  <start>
+    <choice>
+      <ref name="Function" />
+      <ref name="Class" />
+      <ref name="Variable" />
+      <ref name="Namespace" />
+      <ref name="Typedef" />
+    </choice>
+  </start>

I understand that the set of choices is intentionally limited, to simplify the 
output, but it looks like we're missing some necessary categories: enumeration 
types, (Objective-C) protocols, (Objective-C) categories and (Objective-C) 
properties, at a minimum. Should there still be an "other" category here?

I assume that parameters (both template and function) will always be documented 
as part of their class/function/etc? Similarly, enumerators will be documented 
as part of their enumeration type? What should happen if one asks for the XML 
representation of, e.g., a parameter's comment?

+      <!-- XXX: Maybe use <interleave> since order of these elements should not
+           be important? -->

I like the idea of us putting the XML in a sane order (as you have it now), 
because it saves clients from each having to reconstruct a sane ordering 
themselves. Clients who want a different ordering are, of course, free to do 
that on their own.

I assume that "Function" elements need something for "throws" or "exceptions", 
but that's not at all important now.

+  <define name="Class">
+    <element name="Class">
+      <optional>
+        <attribute name="isTemplate">
+          <data type="boolean" />
+        </attribute>
+      </optional>
+      <optional>
+        <attribute name="isTemplatePartialSpecialization">
+          <data type="boolean" />
+        </attribute>
+      </optional>
+      <optional>
+        <attribute name="isTemplateSpecialization">
+          <data type="boolean" />
+        </attribute>
+      </optional>

isTemplatePartialSpecialization == isTemplate && isTemplateSpecialization?

+  <define name="Name">
+    <element name="Name">
+      <!-- Non-empty text content. -->
+      <data type="string">
+        <param name="pattern">.*\S.*</param>
+      </data>
+    </element>
+  </define>

Presumably, the name is the unqualified name . One could imagine that it would 
be useful to have (in separate elements!) the qualifier of the name (e.g., the 
"std::" in "std::sort") and the template arguments of a specialization, so 
clients could render the full name if they want it. But, that's just grave.

More importantly,  it would be very useful to have the USR as a separate 
(optional) element, because USRs allow indexers based on libclang to identify 
entities across translation units.

Similarly, for each kind of entity, could we add "location" information, e.g., 
the file, line, and column at which the declaration occurs?

+void CommentASTToXMLConverter::visitFullComment(const FullComment *C) {
+  FullCommentParts Parts(C);
+
+  const DeclInfo *DI = C->getDeclInfo();
// …
+    bool FoundName = false;
+    if (const NamedDecl *ND = dyn_cast<NamedDecl>(DI->ThisDecl)) {
+      if (DeclarationName DeclName = ND->getDeclName()) {
+        Result << "<Name>";
+        std::string Name = DeclName.getAsString();
+        appendToResultWithXMLEscaping(Name);
+        FoundName = true;
+        Result << "</Name>";
+      }

Two comments here… first, this won't work for something like

  typedef struct { … } Foo;

where we almost surely want to use the name 'Foo'.

Second, for an Objective-C category, e.g.,

@interface NSString (NSCPlusPlusParser)
@endif

we probably want the name to be the full "NSString (NSCPlusPlusParser)".

+    if (!FoundName)
+      Result << "<Name>unknown</Name>";

How about "<anonymous>" or simply the empty string "" rather than "unknown"? 
Anonymous entities aren't that rare in C/C++. 

+  } else {
+    // No DeclInfo -- just emit some root tag and name tag.
+    RootEndTag = "</Function>";
+    Result << "<Function><Name>unknown</Name>";
+  }

As eluded to above, I think I'd rather that we have an "Other" in the schema. 
The "name" part should probably be factored out, since there are likely to be 
"other" declarations that have names.

        - Doug


_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] libclang API for comment-to-xml conversion

Reply via email to