[PATCH] D41102: Setup clang-doc frontend framework

Sam McCall via Phabricator via cfe-commits Wed, 31 Jan 2018 01:10:22 -0800

sammccall added a comment.

Really sorry about the delay in getting to this.
At a high level, I'm most concerned about:


- the monolithic in-memory intermediate format, which seems to put hard limits 
on performance and scalability
- having high-level documentation and clear APIs
- having multiple intermediate formats



================
Comment at: test/Tooling/clang-doc-basic.cpp:3
+// RUN: mkdir %t
+// RUN: echo '[{"directory":"%t","command":"clang++ -c 
%t/test.cpp","file":"%t/test.cpp"}]' | sed -e 's/\\/\//g' > 
%t/compile_commands.json
+// RUN: cp "%s" "%t/test.cpp"
----------------
nit: you can also just write compile_flags.txt, which in this case would be 
empty


================
Comment at: test/Tooling/clang-doc-basic.cpp:22
+ // CHECK: ---
+ // CHECK: Qualified Name:  ''
+ // CHECK: Name:            ''
----------------
what is this? The TU? The global namespace?
What's the value in emitting it?


================
Comment at: tools/clang-doc/ClangDoc.h:1
+//===-- ClangDoc.cpp - ClangDoc ---------------------------------*- C++ 
-*-===//
+//
----------------
This needs some high-level documentation: what does the clang-doc library do, 
what's the main user (clang-doc command-line tool), what are the major moving 
parts.

I don't personally have a strong opinion on how this is split between this 
header / the implementation / a documentation page for the tool itself, but 
we'll probably need *something* for each of those.

(I think it's OK to defer the user-facing documentation to another patch, but 
we should do it before the tool becomes widely publicized or included in an 
llvm release)


================
Comment at: tools/clang-doc/ClangDoc.h:27
+
+// A Context which contains extra options which are used in ClangMoveTool.
+struct ClangDocContext {
----------------
what's clangmovetool?


================
Comment at: tools/clang-doc/ClangDoc.h:27
+
+// A Context which contains extra options which are used in ClangMoveTool.
+struct ClangDocContext {
----------------
sammccall wrote:
> what's clangmovetool?
nit: this sounds more like "options" than a context to me, though there's only 
one member to go on :-)


================
Comment at: tools/clang-doc/ClangDoc.h:29
+struct ClangDocContext {
+  // Which format in which to emit representation.
+  OutFormat EmitFormat;
----------------
Is this the intermediate representation referred to in the design doc, or the 
final output format?

If the former, why two formats rather than picking one?
YAML is nice for being usable by out-of-tree tools (though not as nice as 
JSON). But it seems like providing YAML as a trivial backend format would fit 
well?
Bitcode is presumably more space-efficient - if this is significant in practice 
it seems like a better choice.


================
Comment at: tools/clang-doc/ClangDoc.h:33
+
+class ClangDocVisitor : public RecursiveASTVisitor<ClangDocVisitor> {
+public:
----------------
This API makes essentially everything public. Is that the intent?

It seems like `ClangDocVisitor` is a detail, and the operation you want to 
expose is "extract doc from this AST into this reporter" or maybe "create an 
AST consumer that feeds this reporter".

It would be useful to have an API to extract documentation from individual AST 
nodes (e.g. a Decl). But I'd be nervous about trying to use the classes exposed 
here to do that. If it's efficiently possible, it'd be nice to expose a 
function.
(one use case for this is clangd)


================
Comment at: tools/clang-doc/ClangDoc.h:39
+
+  bool VisitTagDecl(const TagDecl *D);
+  bool VisitNamespaceDecl(const NamespaceDecl *D);
----------------
`override` where applicable


================
Comment at: tools/clang-doc/ClangDoc.h:80
+
+class ClangDocActionFactory : public tooling::FrontendActionFactory {
+public:
----------------
this class can definitely be hidden in the c++ file, behind a 
newClangDocActionFactory() func
(actually I think newFrontendActionFactory in Tooling.h could be extended to 
cover this, but not 100% sure)


================
Comment at: tools/clang-doc/ClangDocReporter.cpp:1
+//===-- ClangDocReporter.cpp - ClangDoc Reporter ----------------*- C++ 
-*-===//
+//
----------------
It looks like the plan for merging data across sources is to hold all 
information in one in-memory structure and incrementally add to it as you get 
information from TUs.
(This should be documented somewhere!)

This seems somewhat hostile to parallel processing: you're going to need to 
synchronize access to the structs owned by the ClangDocReporter if you want to 
gather from multiple TUs at once. Moreover, documenting large codebases using 
multiple machines in parallel seems very difficult.
And obviously it assumes you can fit the generated documentation for the 
codebase in memory, which would be nice to avoid.

Have you considered a mapreduce-like architecture, where the mapper gets AST 
callbacks and spits out data, and the reducer is responsible for assembling all 
the data together?

We don't have good framework support for mapreduce in open-source clang (we do 
have something internally at google, and I'd really like to better support this 
pattern in libtooling).
Still, you can see a simple example of this pattern (albeit with a trivial 
reducer) in 
clang/tools/extra/clangd/global-symbol-builder/GlobalSymbolBuilderMain.cpp.


================
Comment at: tools/clang-doc/ClangDocReporter.h:1
+//===-- Doc.cpp - ClangDoc --------------------------------------*- C++ 
-*-===//
+//
----------------
nit: header is out of sync with the filename

Each header should have a high level description of what this component is and 
how it fits into the system.


================
Comment at: tools/clang-doc/ClangDocReporter.h:33
+
+enum class OutFormat { YAML, LLVM };
+
----------------
document :-)
I'm not sure "LLVM" is a suitable name for the bitcode format.
It probably makes sense just to name this as being "clang-doc's binary format", 
maybe commenting that it's related to LLVM bitcode. It's really an 
implementation detail: these files won't be (I think?) interoperable with any 
other tools that process bitcode.


================
Comment at: tools/clang-doc/ClangDocReporter.h:124
+
+class ClangDocReporter : public ConstCommentVisitor<ClangDocReporter> {
+public:
----------------
What's the relationship between ClangDocReporter, the classes in ClangDoc.h and 
the intermediate format?

If the reporter is fixed-function and always writes the intermediate format, 
then it seems something of a confusing name - I'd have expected "Reporter" to 
be an abstract set of callbacks e.g. for producing different final output 
formats.
In that case, it seems like even injecting (into ClangDoc.h classes) the thing 
that builds the intermediate format is overkill - why not just inject a sink 
for the structs that get produced?


https://reviews.llvm.org/D41102



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D41102: Setup clang-doc frontend framework

Reply via email to