Re: [cfe-commits] [PATCH] Structured comment parsing, retaining comments in AST

Jordan Rose Mon, 11 Jun 2012 09:08:55 -0700

Hi, Dmitri. Some comments on various parts of this approach:

> +  /// \brief Source ranges for all of the comments in the source file,
> +  /// sorted in order of appearance in the translation unit.
> +  std::vector<SourceRange> CommentSourceRanges;
> +
> +  /// \brief Mapping from declarations to their comments, once we have
> +  /// already looked up the comment associated with a given declaration.
> +  llvm::DenseMap<const Decl *, std::string> DeclComments;


I'm wondering what the rationale was behind having separate strings and source 
ranges here, rather than first-class Comment objects (and a map from Decl to 
Comment). I guess this way is a little lighter, but as we add functionality 
that might become a hindrance.


> @@ -389,7 +397,7 @@
>    void setPrintingPolicy(clang::PrintingPolicy Policy) {
>      PrintingPolicy = Policy;
>    }
> -  
> +
>    SourceManager& getSourceManager() { return SourceMgr; }
>    const SourceManager& getSourceManager() const { return SourceMgr; }
>    void *Allocate(unsigned Size, unsigned Align = 8) const {

Please watch for whitespace-only changes, even ones conforming to the LLVM 
standard.


> +private:
> +  bool isDoxygenComment(SourceRange Comment, bool Member = false);
> +


Unless this is needed in more than one source file, consider making this a 
static function instead of a private method, to keep the header file as trim as 
possible.


> +  /// \brief Return the Doxygen-style comment attached to a given 
> declaration.
> +  const char *getCommentForDecl(const Decl *D);
> +

This should probably use StringRef.


> +    Comment = clang_Cursor_getComment(Cursor);
> +    CommentCString = clang_getCString(Comment);
> +    if (CommentCString != NULL && CommentCString[0] != '\0') {
> +      printf(" [Comment=");
> +      for ( ; *CommentCString; ++CommentCString) {
> +        if (*CommentCString != '\n')
> +          putchar(*CommentCString);
> +        else
> +          printf("\\n");
> +      }
> +      printf("]");
> +    }

Is \n the only thing worth escaping here? Consider tab, control characters, 
Unicode, etc. Probably we should just print this in the same way we print 
string literals (whatever we do for that).


> +  bool Invalid = false;
> +  const char *BufferStart
> +    = SourceMgr.getBufferData(SourceMgr.getFileID(Comment.getBegin()),
> +                              &Invalid).data();
> +  if (Invalid)
> +    return false;
> +
> +  const char *Start = BufferStart + 
> SourceMgr.getFileOffset(Comment.getBegin());
> +  const char *End = BufferStart + SourceMgr.getFileOffset(Comment.getEnd());

I don't know if we currently support having a file broken up into chunks, but 
you can ask for the part of the file you need a lot more easily like this:

SourceMgr.getCharacterData(Comment.getBegin(), &Invalid);

I'm not sure offhand how to test that the end is at least three characters past 
the start, though.


> +namespace {
> +/// \breif Compare two non-overlapping source ranges.
> +class BeforeInTranslationUnit {

Typo: \brief. But there's a version of this for SourceLocations already in 
SourceManager.h (LocBeforeThanCompare). Maybe you should just templatize that.


> +    if (DeclLocDecomp.first == CommentBeginDecomp.first &&
> +        SourceMgr.getLineNumber(DeclLocDecomp.first, DeclLocDecomp.second)
> +          == SourceMgr.getLineNumber(CommentBeginDecomp.first,
> +                                     CommentBeginDecomp.second)) {

Instead of computing both line numbers, how about just using std::find to see 
if there are any newlines between the decl and the comment? One tricky thing is 
that both \n and \r count as newlines.

> +  // Compute the starting line for the declaration and for the end of the
> +  // comment (this is expensive).
> +  unsigned DeclLocLine
> +    = SourceMgr.getLineNumber(DeclLocDecomp.first, DeclLocDecomp.second);
> +  unsigned CommentEndLine
> +    = SourceMgr.getLineNumber(CommentEndDecomp.first,
> +                              CommentEndDecomp.second);

Ditto.


> +  // If the comment does not end on the line prior to the declaration, then
> +  // the comment is not associated with the declaration at all.
> +  // TODO: check if Doxygen does the same.
> +  if (CommentEndLine + 1 != DeclLocLine)
> +    return NULL;

I am pretty sure this is not the case; I think (from memory) that both of these 
count as attached:

/// \brief A class.

class A {};

///\brief Another class
// TODO: remove this!
class B {};


> +    Result.append(FileBufferStart + DecompStart.second,
> +                  FileBufferStart + DecompEnd.second + 1);

This includes all the slashes and/or continuation stars, yes? Eventually we'll 
probably want to remove those from the middle of the buffer. Also, Doxygen 
considers blank lines to be significant, IIRC.

Waiting to see where this goes!
Jordan
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] Structured comment parsing, retaining comments in AST

Reply via email to