Re: [cfe-commits] [PATCH] Implements support to run standalone tools

Douglas Gregor Wed, 14 Mar 2012 11:30:08 -0700

On Mar 14, 2012, at 11:14 AM, Manuel Klimek wrote:

> On Wed, Mar 14, 2012 at 7:05 PM, Douglas Gregor <[email protected]> wrote:
>> 
>> On Mar 14, 2012, at 8:35 AM, Manuel Klimek wrote:
>> 
>>> Hi Doug,
>>> 
>>> first, a big thank you for finding the time to review. I completely
>>> agree with everything you write. Especially getting the tooling
>>> infrastructure into a more non-cmake centric shape is something that
>>> was already high on the TODO list, but I didn't want to make further
>>> changes before getting first feedback, because I figured reviewing
>>> something that still constantly changes is no fun for you ... Which
>>> brings me to a central question - which of the changes would you want
>>> to see finished before getting a first version in? Especially making
>>> the command line handling more orthogonal is something I'd like to do
>>> in lock-step with our google-internal build system handling (which is
>>> not cmake), and thus I think would be a great use case to make sure we
>>> don't miss anything. Of course I totally understand if you're not
>>> comfortable checking in the current design, and I guess that assuring
>>> you that getting the tooling infrastructure into a shape that the
>>> community is happy with is my top priority might not help here ;) On
>>> the other hand, for various reasons iterating inside the repository
>>> would make me quite a bit more productive, so I'll still ask :)
>> 
>> I'd like to see ClangTool split into CompilationDatabase and Tool (or 
>> whatever they're called) before it goes in, because I don't like committing 
>> something with an architecture that we know will change. Anything else 
>> (-fcompilation-database, simple compilation database, convenience functions, 
>> change from FrontendActionFactory to function object) can go in later.
>> 
>>> Some more comments inline...
>>> 
>>> On Wed, Mar 14, 2012 at 1:35 PM, Douglas Gregor <[email protected]> wrote:
>>>> Hi Manuel,
>>>> 
>>>> I finally found some free cycles to dig into this.
>>>> 
>>>> I love the functionality provided by the tooling infrastructure, and it's 
>>>> really nice to be able to easily create a single process that churns 
>>>> through all of the translation units in a program to perform some 
>>>> analysis/refactoring/whatever. There are also some little gems hidden in 
>>>> here (such as being able to easily syntax-check code supplied by a string).
>>>> 
>>>> My major concern with the tooling patch is that it's tailored for the case 
>>>> of building tools that run over the JSON database emitted by CMake. It's 
>>>> great to make that case work smoothly, but the tooling infrastructure as 
>>>> presented does not adequately support users who aren't already using 
>>>> CMake. That immediately eliminates the vast majority of potential users of 
>>>> the tooling infrastructure, because while CMake is popular, it's nowhere 
>>>> near dominant, and the barrier to entry for switching one's project over 
>>>> to CMake is quite possibly higher than the barrier to entry for building a 
>>>> tool w/o the tooling infrastructure that will work on non-CMake projects.
>>>> 
>>>> I suggest a more general architecture that separates out the "program 
>>>> build" database from the code that parses code and runs the action on that 
>>>> code. To be a bit more concrete, I'm suggesting splitting ClangTool into 
>>>> two subsystems:
>>>> 
>>>>        - An abstract class CompilationDatabase that allows one to iterate 
>>>> over the CompileCommands that went into a particular build (or apply some 
>>>> function object to each CompileCommand), along with:
>>>>                - A subclass JSONCompilationDatabase that retrieves 
>>>> CompileCommands from the JSON output produced by CMake
>>>>                - A subclass SimpleCompilationDatabase where one 
>>>> programmatically adds CompileCommands
>>>>                - Note: it should be possible for us to create additional 
>>>> compilation database kinds, e.g., by parsing makefiles directly
>>>>                - A function that makes it easy to take argc/argv an 
>>>> interpret it either as a use of JSONCompilationDatabase or 
>>>> SimpleCompilationDatabase. This should make it easy to create a unified 
>>>> command-line interface to Tooling-based Clang tools, so it's easy for end 
>>>> users to run different tools on their projects
>>>>                - A function that writes a CompilationDatabase in JSON 
>>>> format, so that it can be loaded again. (Not immediately critical, but 
>>>> useful nonetheless)
>>>> 
>>>>        - A class ("Tooling"? "Tool"?) that encapsulates the environment in 
>>>> which the tool will be executed along with the ability to actually do the 
>>>> execution, e.g.,
>>>>                - mapVirtualFiles, as in ClangTool
>>>>                - a run function that runs a single CompileCommand with a 
>>>> frontend action factory
>>>>                - a run function that runs every CompileCommand in the 
>>>> given program database with a frontend action factory
>>>>                - a run function that compiles a given code string (+ 
>>>> command-line options, presumably) with a frontend action factory, like 
>>>> runSyntaxOnlyToolOnCode does
>>> 
>>> I'm not sure I understand this correctly - compiling a given code
>>> string is done by mapping it in as a virtual file with a specific name
>>> and adding some special command default command line arguments.
>>> Perhaps you are imagining a different use case, but for me
>>> runSyntaxOnlyToolOnCode is first and foremost a unit testing help.
>> 
>> It's fairly common for someone to ask, "how do I compile some code that I 
>> have in a std::string?" runSyntaxOnlyToolOnCode starts to solve that 
>> problem. I mentioned command-line arguments simply because one will need to 
>> be able to specify (at least) language dialect options for this parse, since 
>> we have no file name to go on.
>> 
>>>> Now, there's no reason why we can't still have two-line syntax checkers 
>>>> with this architecture. It might look like this:
>>>> 
>>>>  int main(int argc, char **argv) {
>>>>    llvm::OwningPtr<CompilationDatabase> 
>>>> Database(loadCompilationDatabase(argc, argv));
>>>>    return Tool().run(newFrontendActionFactory<clang::SyntaxOnlyAction>(), 
>>>> *Database);
>>>>  }
>>>> 
>>>> Now, to actually make tooling useful to non-CMake users, we'd want to make 
>>>> it possible to create a JSON compilation database without involving CMake 
>>>> at all. I suggest that we add a Clang command-line flag 
>>>> "-fcompilation-database=<filename>" and have the driver update <filename> 
>>>> with each compilation command it is invoked with. That way, one could do a 
>>>> normal project build with, e.g.,
>>>> 
>>>>        make CXX=clang++ 
>>>> CXXFLAGS="-fcompilation-database=compile_commands.json"
>>> 
>>> I actually already have written something similar (well, it's
>>> currently a python wrapper which might be a bad idea, but it was
>>> really simple). The design I have is that we output a little JSON
>>> snippet next to each .o file. The problem with handing the compilation
>>> database to each compile step is concurrent runs - how would you
>>> propose to sync on the database? I can't imagine implementing the sync
>>> would be worth the cost, but I'm happy to be convinced otherwise…
>> 
>> POSIX file locks are fine; as I noted in the conversion with Chandler, most 
>> developers have projects small enough that the overhead won't matter to 
>> them, but simplicity and extremely important.
> 
> Do they work on Windows? (I did a quick search and it seems like file
> locking on windows is a pain, but I might have missed something).


I could live with Windows not supporting this flag until someone interested in 
that platform comes along and fixes it.

>>>> +  /// \brief Returns the file manager used in the tool.
>>>> +  ///
>>>> +  /// The file manager is shared between all translation units.
>>>> +  FileManager &getFiles() { return Files; }
>>>> 
>>>> Given that the FileManager is exposed, should mapVirtualFile deal in 
>>>> FileEntry*'s? (Or, at least, have an overload that uses a FileEntry*?)
>>> 
>>> /me wonders why he exposed the FileManager at all - one interesting
>>> part that's missing from the current CL is one of the next steps that
>>> allows us to do refactorings (if you're interested, the current
>>> incarnation is at:
>>> http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/include/clang/Tooling/Refactoring.h?view=markup
>>> with an example tool at
>>> http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/tools/remove-cstr-calls/RemoveCStrCalls.cpp?view=markup)
>>> 
>>> I'll investigate. I don't want users to have to deal with
>>> FileEntry*'s, and rather keep the interface as simple as possible.
>>> Unless you have a specific use case?
>> 
>> No use case in mind. I'd be just as happy (heck, probably happier) if the 
>> FileManager wasn't exposed at all.
>> 
>>>> +  // Create the compilers actual diagnostics engine.
>>>> +  Compiler.createDiagnostics(CC1Args.size(),
>>>> +                             const_cast<char**>(CC1Args.data()));
>>>> 
>>>> Should the tooling infrastructure provide a way to deal with diagnostics?
>>> 
>>> I have not really thought of that. We probably want to be able to
>>> inject diagnostics at some point (we have some diagnostic mapreduces
>>> internally when we iterate on implementing new diagnostics).
>> 
>> Okay.
>> 
>>>> +using clang::tooling::ClangTool;
>>>> +using clang::tooling::newFrontendActionFactory;
>>>> +
>>>> +int main(int argc, char **argv) {
>>>> +  ClangTool Tool(argc, argv);
>>>> +  return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
>>>> +}
>>>> 
>>>> I feel like the tooling infrastructure would benefit from a slightly more 
>>>> involved example, especially one that benefits from running over all files 
>>>> in a project within the same process. Perhaps that isn't easy until we 
>>>> have an AST matching framework?
>>> 
>>> The AST matching framework will actually not help much here - this
>>> becomes very interesting when you do real refactorings (see the
>>> example I cited above), because then we need to deduplicate edits at
>>> the end of running over all translation units.
>> 
>> I was thinking of something a bit more simple. For example, finding all 
>> references to a given global variable within a project.
> 
> How does that benefit from running everything in-process?

Easy to eliminate duplicates from references within headers, perhaps?

/me is grasping at straws.

        - Doug
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [PATCH] Implements support to run standalone tools

Reply via email to