Re: [cfe-commits] [PATCH] Implements support to run standalone tools

Douglas Gregor Wed, 14 Mar 2012 11:07:45 -0700

On Mar 14, 2012, at 8:35 AM, Manuel Klimek wrote:

> Hi Doug,
> 
> first, a big thank you for finding the time to review. I completely
> agree with everything you write. Especially getting the tooling
> infrastructure into a more non-cmake centric shape is something that
> was already high on the TODO list, but I didn't want to make further
> changes before getting first feedback, because I figured reviewing
> something that still constantly changes is no fun for you ... Which
> brings me to a central question - which of the changes would you want
> to see finished before getting a first version in? Especially making
> the command line handling more orthogonal is something I'd like to do
> in lock-step with our google-internal build system handling (which is
> not cmake), and thus I think would be a great use case to make sure we
> don't miss anything. Of course I totally understand if you're not
> comfortable checking in the current design, and I guess that assuring
> you that getting the tooling infrastructure into a shape that the
> community is happy with is my top priority might not help here ;) On
> the other hand, for various reasons iterating inside the repository
> would make me quite a bit more productive, so I'll still ask :)


I'd like to see ClangTool split into CompilationDatabase and Tool (or whatever 
they're called) before it goes in, because I don't like committing something 
with an architecture that we know will change. Anything else 
(-fcompilation-database, simple compilation database, convenience functions, 
change from FrontendActionFactory to function object) can go in later.

> Some more comments inline...
> 
> On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <[email protected]> wrote:
>> Hi Manuel,
>> 
>> I finally found some free cycles to dig into this.
>> 
>> I love the functionality provided by the tooling infrastructure, and it's 
>> really nice to be able to easily create a single process that churns through 
>> all of the translation units in a program to perform some 
>> analysis/refactoring/whatever. There are also some little gems hidden in 
>> here (such as being able to easily syntax-check code supplied by a string).
>> 
>> My major concern with the tooling patch is that it's tailored for the case 
>> of building tools that run over the JSON database emitted by CMake. It's 
>> great to make that case work smoothly, but the tooling infrastructure as 
>> presented does not adequately support users who aren't already using CMake. 
>> That immediately eliminates the vast majority of potential users of the 
>> tooling infrastructure, because while CMake is popular, it's nowhere near 
>> dominant, and the barrier to entry for switching one's project over to CMake 
>> is quite possibly higher than the barrier to entry for building a tool w/o 
>> the tooling infrastructure that will work on non-CMake projects.
>> 
>> I suggest a more general architecture that separates out the "program build" 
>> database from the code that parses code and runs the action on that code. To 
>> be a bit more concrete, I'm suggesting splitting ClangTool into two 
>> subsystems:
>> 
>>        - An abstract class CompilationDatabase that allows one to iterate 
>> over the CompileCommands that went into a particular build (or apply some 
>> function object to each CompileCommand), along with:
>>                - A subclass JSONCompilationDatabase that retrieves 
>> CompileCommands from the JSON output produced by CMake
>>                - A subclass SimpleCompilationDatabase where one 
>> programmatically adds CompileCommands
>>                - Note: it should be possible for us to create additional 
>> compilation database kinds, e.g., by parsing makefiles directly
>>                - A function that makes it easy to take argc/argv an 
>> interpret it either as a use of JSONCompilationDatabase or 
>> SimpleCompilationDatabase. This should make it easy to create a unified 
>> command-line interface to Tooling-based Clang tools, so it's easy for end 
>> users to run different tools on their projects
>>                - A function that writes a CompilationDatabase in JSON 
>> format, so that it can be loaded again. (Not immediately critical, but 
>> useful nonetheless)
>> 
>>        - A class ("Tooling"? "Tool"?) that encapsulates the environment in 
>> which the tool will be executed along with the ability to actually do the 
>> execution, e.g.,
>>                - mapVirtualFiles, as in ClangTool
>>                - a run function that runs a single CompileCommand with a 
>> frontend action factory
>>                - a run function that runs every CompileCommand in the given 
>> program database with a frontend action factory
>>                - a run function that compiles a given code string (+ 
>> command-line options, presumably) with a frontend action factory, like 
>> runSyntaxOnlyToolOnCode does
> 
> I'm not sure I understand this correctly - compiling a given code
> string is done by mapping it in as a virtual file with a specific name
> and adding some special command default command line arguments.
> Perhaps you are imagining a different use case, but for me
> runSyntaxOnlyToolOnCode is first and foremost a unit testing help.

It's fairly common for someone to ask, "how do I compile some code that I have 
in a std::string?" runSyntaxOnlyToolOnCode starts to solve that problem. I 
mentioned command-line arguments simply because one will need to be able to 
specify (at least) language dialect options for this parse, since we have no 
file name to go on.

>> Now, there's no reason why we can't still have two-line syntax checkers with 
>> this architecture. It might look like this:
>> 
>>  int main(int argc, char **argv) {
>>    llvm::OwningPtr<CompilationDatabase> 
>> Database(loadCompilationDatabase(argc, argv));
>>    return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), 
>> *Database);
>>  }
>> 
>> Now, to actually make tooling useful to non-CMake users, we'd want to make 
>> it possible to create a JSON compilation database without involving CMake at 
>> all. I suggest that we add a Clang command-line flag 
>> "-fcompilation-database=<filename>" and have the driver update <filename> 
>> with each compilation command it is invoked with. That way, one could do a 
>> normal project build with, e.g.,
>> 
>>        make CXX=clang++ 
>> CXXFLAGS="-fcompilation-database=compile_commands.json"
> 
> I actually already have written something similar (well, it's
> currently a python wrapper which might be a bad idea, but it was
> really simple). The design I have is that we output a little JSON
> snippet next to each .o file. The problem with handing the compilation
> database to each compile step is concurrent runs - how would you
> propose to sync on the database? I can't imagine implementing the sync
> would be worth the cost, but I'm happy to be convinced otherwise…

POSIX file locks are fine; as I noted in the conversion with Chandler, most 
developers have projects small enough that the overhead won't matter to them, 
but simplicity and extremely important.

>> +  /// \brief Returns the file manager used in the tool.
>> +  ///
>> +  /// The file manager is shared between all translation units.
>> +  FileManager &getFiles() { return Files; }
>> 
>> Given that the FileManager is exposed, should mapVirtualFile deal in 
>> FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)
> 
> /me wonders why he exposed the FileManager at all - one interesting
> part that's missing from the current CL is one of the next steps that
> allows us to do refactorings (if you're interested, the current
> incarnation is at:
> http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/include/clang/Tooling/Refactoring.h?view=markup
> with an example tool at
> http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/tools/remove-cstr-calls/RemoveCStrCalls.cpp?view=markup)
> 
> I'll investigate. I don't want users to have to deal with
> FileEntry*'s, and rather keep the interface as simple as possible.
> Unless you have a specific use case?

No use case in mind. I'd be just as happy (heck, probably happier) if the 
FileManager wasn't exposed at all.

>> +  // Create the compilers actual diagnostics engine.
>> +  Compiler.createDiagnostics(CC1Args.size(),
>> +                             const_cast<char**>(CC1Args.data()));
>> 
>> Should the tooling infrastructure provide a way to deal with diagnostics?
> 
> I have not really thought of that. We probably want to be able to
> inject diagnostics at some point (we have some diagnostic mapreduces
> internally when we iterate on implementing new diagnostics).

Okay. 

>> +using clang::tooling::ClangTool;
>> +using clang::tooling::newFrontendActionFactory;
>> +
>> +int main(int argc, char **argv) {
>> +  ClangTool Tool(argc, argv);
>> +  return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
>> +}
>> 
>> I feel like the tooling infrastructure would benefit from a slightly more 
>> involved example, especially one that benefits from running over all files 
>> in a project within the same process. Perhaps that isn't easy until we have 
>> an AST matching framework?
> 
> The AST matching framework will actually not help much here - this
> becomes very interesting when you do real refactorings (see the
> example I cited above), because then we need to deduplicate edits at
> the end of running over all translation units.

I was thinking of something a bit more simple. For example, finding all 
references to a given global variable within a project.

        - Doug


_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] Implements support to run standalone tools

Reply via email to