Re: [PATCH] Modeling APIs in the Static Analyzer

Ted Kremenek Mon, 11 Aug 2014 23:45:29 -0700

> On Aug 1, 2014, at 1:44 AM, Gábor Horváth <[email protected]> wrote:
> 
> Hi Ted,
> 
> Thank you for the review.
> 
> On 1 August 2014 07:25, Ted Kremenek <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Gábor,
> 
> This is looking good to me.  Some minor nits/comments:
> 
> - Please add doxygen comments for the CodeInjector class.
>  
> Done.
>  
> - For the BugReporter patch, are there tests for that functionality change?  
> I saw tests in the other patch, but not that one.  It's fine to separate the 
> review of that change before the primary change goes in, but I was curious.
> 
> Well, it may be a bit complicated. I deleted some code in BugReporter to not 
> to discard bug reports that are in a model file, and the plist part of the 
> test case only pass if that patch is applied (if the patch is not applied the 
> nullpointer dereference warning that has the position in the modelfile will 
> be discarded. In the long term it would be better to report these errors 
> elsewhere but it is not supported yet by the bugreporter patch). I can move 
> the plist check into a separate testcase and add that case to the BugReporter 
> patch instead. The division by zero test should work without the BugReporter 
> patch.


Ok, this make sense.  Can you clarify what you mean by "better to report these 
errors elsewhere"?

>  
> - As for breaking code in the 'extra' repository, LLVM-internal API is not 
> sacrosanct.  If we break the 'extra' projects we just need to update them, 
> but I'm not certain if that is possible in this case.
> 
> As far as I can remember it would be a straightforward fix in the extra 
> repository. Clang-tidy calls CreateAnalysisConsumer.

Sounds good.  Let's get the right API and just fix up clang-tidy.

>  
> - For comments, please consistently use sentence casing and end with periods, 
> and for type names use the appropriate casing.  For example:
> 
> +  // modules create a separate compilerinstance for parsing modules, maybe 
> it is
> +  // for reason so I mimic this behavior
> +  CompilerInstance Instance;
> ...
> 
> This comment looks a bit suspect, since it seems like a question to yourself. 
>  Here you use the word "I"; who is "I" in the context of this code?  The 
> comment also seems like an unanswered question.  Is this a stale comment?
> 
> 
> Done, the comment was improved.
>  
> Another example is this comment:
> 
> +  // FIXME: double memoization is redundant. Here and in bodyfarm.
> +  llvm::StringMap<Stmt *> Bodies;
> 
> This can be made slightly cleaner.  For example:
> 
> +  // FIXME: Double memorization is redundant, with
> +  /// memoization both here and in BodyFarm.
> +  llvm::StringMap<Stmt *> Bodies;
> 
> Done.  
> 
> - Only use doxygen comments for documentation.  For example:
> 
> +  if (notzero_notmodeled(p)) {
> +   /// There is no information about the value of p, because
> +   /// notzero_notmodeled is not modeled and the function definition
> +   /// is not available.
> +    int j = 5 / p; // expected-warning {{Division by zero}}
> +  }
> 
> In this case we should use '//', not '///'.  The former are true comments, 
> and the latter are candidates to be extracted for documentation.
> 
> 
> Done.
>  
> Overall, however, this is getting really close.
> 
> 
> It is great.
> 
> Thanks,
> Gábor

Wonderful.  The rest of my comments are minor:

> +/// \brief CodeInjector is an interface which is responsible forinjecting 
> AST of
> +/// function definitions that may not be available in the original source.
> +///
> +/// The getBody function will be called each time the static analyzer 
> examines a
> +/// function call that has no definition available in the current translation
> +/// unit. If the returned statement is not a nullpointer, it is assumed to be
> +/// the body of a function which will be used for the analysis. The source of
> +/// the body can be arbitrary, but it is advised to use memoization to avoid
> +/// unnecessary reparsing of the external source that provides the body of 
> the
> +/// functions.


  "forinjecting" -> "for injecting"
  "nullpointer" -> "null pointer"

> +++ include/clang/StaticAnalyzer/Frontend/FrontendActions.h (working copy)
> @@ -10,10 +10,16 @@
>  #ifndef LLVM_CLANG_GR_FRONTENDACTIONS_H
>  #define LLVM_CLANG_GR_FRONTENDACTIONS_H
>  
> +#include <map>
> +


This "#include" of <map> doesn't seem needed.  Neither is the one in 
ModelConsumer.h

> +++ lib/StaticAnalyzer/Frontend/ModelConsumer.cpp (working copy)
> @@ -0,0 +1,42 @@
> +//===--- ModelConsumer.cpp - ASTConsumer for consuming model files 
> --------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +///
> +/// \file
> +/// \brief This file implements an ASTConsumer for consuming model files.
> +///
> +/// This ASTConsumer handles the AST of a parsed model file. All top level
> +/// function definitions will be collected from that model file for later
> +/// retrieval during the static analyzis. The body of these functions will 
> not
> +/// be injected into the ASTUnit of the analyzed translation unit. It will be
> +/// available through the BodyFarm which is utilized by the 
> AnalysisDeclContext
> +/// class.
> +///

  "analyzis" -> "analysis"

> +  // The instance wants to take ownership, however disablefree frontend 
> option
> +  // is set to true to avoid double free issues

Use  the actual casing for the option for technical precision:

  DisableFree

> +  /// \brief Synthetize a body for a declaration
> +  ///
> +  /// This method first looks up the appropriate model file based on the
> +  /// model-path configuration option and the name of the declaration that is
> +  /// looked up. If no model were synthetized yet for a function with that 
> name
> +  /// it will create a new compiler instance to parse the model file using 
> the
> +  /// ASTContext, Preprocessor, SourceManager of the original compiler 
> instance.
> +  /// The former resources are shared between the two compiler instance, so 
> the
> +  /// newly created instance have to "leak" these objects, since they are 
> owned
> +  /// by the original instance.

   Synthetize -> Synthesize
  synthetized -> synthesized

> +  std::vector<std::unique_ptr<ASTUnit> > ModelAsts;

I'd prefer this to be "ModelASTs", as 'AST' is an acronym.

Otherwise, this all looks great to me.

>  
> Cheers,
> Ted
> 
> On Jul 30, 2014, at 3:29 AM, Gábor Horváth <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Hi Ted,
>> 
>> Thank you for the review.
>> 
>> 
>> On 30 July 2014 08:18, Ted Kremenek <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi Gábor,
>> 
>> Thanks for making progress on this very promising enhancement to the 
>> analyzer.  I have an assortment of comments, in no particular order:
>> 
>> - ModelInjector.h and ModelConsumer.h
>> 
>> There is a comment at the top of these files, but I think a bit more 
>> explanation is needed.  For example:
>> 
>>   MetaConsumer.cpp:
>> 
>>     +// "Meta" ASTConsumer for consuming model files.
>> 
>> That doesn't really explain anything.  What does "Meta" in quotes mean?  I 
>> think an explanation here on what this does is helpful when someone 
>> discovers this code for the first time.
>> 
>> Similarly, we should add some high-level comments for CodeInjector.h and 
>> ModelInjector.h.  We have a good start in ModelInjector.h:
>> 
>> +/// \file
>> +/// \brief Defines the clang::ento::ModelInjector class which implements the
>> +/// clang::CodeInjector interface. This class is responsible for injecting
>> +/// function definitions that were synthetized from model files.
>> +///
>> 
>> Let's consider expanding it:
>> 
>>  /// \brief This file defines the clang::ento::ModelInjector class which 
>> implements the
>>  /// clang::CodeInjector interface. This class is responsible for injecting
>>  /// function definitions that were synthesized from model files.
>> 
>>  /// Model files allow definitions of functions to be lazily constituted for 
>> functions
>>  /// which lack bodies in the original source code.  This allows the analyzer
>>  /// to more precisely analyze code that calls such functions, analyzing the
>>  /// artificial definitions (which typically approximate the semantics of the
>>  /// called function) when called by client code.  These definitions are
>>  /// reconstituted lazily, on-demand, by the static analyzer engine.
>> 
>> CodeInjector.h provides some information, but it is a bit vague:
>> 
>> +///
>> +/// \file
>> +/// \brief Defines the clang::CodeInjector interface which is responsible 
>> for
>> +/// injecting AST of function definitions from external source.
>> +///
>> 
>> It's a bit unclear how this gets used.  I think a bit of prose here would 
>> help clarify its role in the static analyzer.  I also think the CodeInjector 
>> interface is also more abstract than the prose describes.  There's nothing 
>> about CodeInjector's interface that requires the injected definitions to 
>> come from an external source.  That's an implementation detail of a concrete 
>> subclass.  Instead, all CodeInjector does is provide an interface that 
>> lazily provides definitions for functions and methods that may not be 
>> present in the original source.
>> 
>> I have added some further documentation to address these issues.
>> 
>>  
>> 
>> I'm also looking at the change to 
>> StaticAnalyzer/Frontend/FrontendActions.cpp, and wonder if we can simplify 
>> things:
>> 
>>> +++ lib/StaticAnalyzer/Frontend/FrontendActions.cpp (working copy)
>>> @@ -7,9 +7,11 @@
>>>  //
>>>  
>>> //===----------------------------------------------------------------------===//
>>>  
>>> +#include "clang/Frontend/CompilerInstance.h"
>>>  #include "clang/StaticAnalyzer/Frontend/FrontendActions.h"
>>> -#include "clang/Frontend/CompilerInstance.h"
>>>  #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
>>> +#include "clang/StaticAnalyzer/Frontend/ModelConsumer.h"
>>> +#include "ModelInjector.h"
>>>  using namespace clang;
>>>  using namespace ento;
>>>  
>>> @@ -18,6 +20,14 @@
>>>    return CreateAnalysisConsumer(CI.getPreprocessor(),
>>>                                  CI.getFrontendOpts().OutputFile,
>>>                                  CI.getAnalyzerOpts(),
>>> -                                CI.getFrontendOpts().Plugins);
>>> +                                CI.getFrontendOpts().Plugins,
>>> +                                new ModelInjector(CI));
>>>  }
>>>  
>> 
>> 
>> It looks like CreateAnalysisConsumer just continues to grow more arguments, 
>> all which derive from using 'CI'.  This seems silly, since this function is 
>> called in one place.  Instead of intro ducting a dependency on 
>> ModelInjector.h in this file, we can just sink these arguments into 
>> CreateAnalysisConsumer() itself, resulting in:
>> 
>>   return CreateAnalysisConsumer(CI);
>> 
>> and let CreateAnalysisConsumer() do all that boilerplate.
>> 
>> That was my original idea as well but it broke the compilation of some code 
>> in extra repository and I wasn't sure if it is ok to break the API with this 
>> patch. But I find it cleaner this way so I modified it in this iteration.
>> 
>> 
>> Next, let's look at the change to FrontendAction:
>> 
>>>  class FrontendAction {
>>> +  /// Is this action invoked on a model file? Model files are incomplete
>>> +  /// translation units that relies on type information from another 
>>> translation
>>> +  /// unit. Check ParseModelFileAction for details.
>>> +  bool ModelFile;
>> 
>> Perhaps "IsModelFile"?  "ModelFile" sounds like it should be a reference to 
>> the file itself.
>> 
>>>    FrontendInputFile CurrentInput;
>>>    std::unique_ptr<ASTUnit> CurrentASTUnit;
>>>    CompilerInstance *Instance;
>>> @@ -105,7 +109,11 @@
>>>    /// @}
>>>  
>>>  public:
>>> -  FrontendAction();
>>> +  /// \brief Constructor
>>> +  ///
>>> +  /// \param modelFile determines whether the source files this action 
>>> invoked
>>> +  /// on should be treated as a model file. Defaults to false.
>>> +  FrontendAction(bool modelFile = false);
>> 
>> It seems suboptimal to modify the interface of FrontendAction just for this 
>> one edge case.  Instead of modifying the constructor arguments, we could 
>> default initialize "IsModelFile" to false, and have a setter to change it.  
>> For example:
>> 
>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *> &Bodies)
>>     : ASTFrontendAction(/*ModelFile=*/true), Bodies(Bodies) {}
>> 
>> becomes:
>>  
>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *> &Bodies)
>>     : Bodies(Bodies)  {
>>     IsModelFile = true;
>>   }
>> 
>> Looking at this more, I wonder if we should modify FrontendAction at all.  
>> The only place where isModelParsingAction() is called is in one spot in 
>> CompilerInstance.cpp:
>> 
>>    if (hasSourceManager() && !Act.isModelParsingAction())
>> 
>> It *might* be cleaner to just have a virtual member function in 
>> FrontendAction, which defaults to returning false, but is generic for all 
>> subclasses to override.  Then we don't need the "IsModelFile" field in 
>> FrontendAction at all, and we just have ParseModelFileAction override that 
>> single member function.  We could then name that method to be something a 
>> bit more generic.  That would allow us to not touch FrontendAction at all 
>> except for providing that single virtual method that can be overridden in 
>> subclasses.  I somewhat prefer this approach because it provides a cleaner 
>> separation of concerns between FrontendAction (which is defined libFrontend) 
>> and the static analyzer.  That would also allow you to get rid of 
>> isModelParsingAction() entirely (replacing it with something more generic).
>> 
>> 
>> You are right, it is much cleaner to use a virtual function, so I modified 
>> the patch to use that approach. The new virtual function has the same name 
>> because I have yet to find any better and more general name yet. Do you have 
>> an idea for a better name?
>>  
>> As for the test case:
>> 
>>> +typedef int* intptr;
>>> +
>>> +void modelled(intptr p);
>>> +
>>> +int main() {
>>> + modelled(0);
>>> + return 0;
>>> +}
>> 
>> Please add some comments in this test file explaining what is happening.  
>> Also, it would be great if this both used FileCheck (which it does now) but 
>> also verified the diagnostics so we get cross-checking of the output (we see 
>> this in some analyzer tests).  It also makes it easier to understand the 
>> test.
>> 
>> Also, is there a reason to break up the tests between 
>> model-suppress-falsepos.cpp and model-file.cpp?  It seems like one test file 
>> will do fine; just clearly comment on what is happening for each test.  I 
>> also recommend called the modeled function "modeledFunction" instead of 
>> "modelled" (which according to my spell checker has an additional 'l'). 
>> 
>> I have merged the test files and also added some commets to explain what is 
>> going on. I have fixed the misspelling as well. The nullpointer dereference 
>> is only checked through plist because the point where the comment with the 
>> expected warning should be added is inside the model file and it did not 
>> work for me if the comment was in a separate file. If there is a different 
>> way to verify the warnings that are in a separate file and I did not find 
>> it, please let me know.
>>  
>> 
>> As for the model files themselves:
>> 
>>> Index: test/Analysis/modelled.model
>>> ===================================================================
>>> --- test/Analysis/modelled.model  (revision 0)
>>> +++ test/Analysis/modelled.model  (working copy)
>>> @@ -0,0 +1,3 @@
>>> +void modelled(intptr p) {
>>> + ++*p;
>>> +}
>>> \ No newline at end of file
>>> Index: test/Analysis/notzero.model
>>> ===================================================================
>>> --- test/Analysis/notzero.model (revision 0)
>>> +++ test/Analysis/notzero.model (working copy)
>> 
>> Let's put these in a separate subdirectory, for example, "models", instead 
>> of mixing them with the tests.  This way they really serve as "inputs" to 
>> the analyzer.
>> 
>> I have moved the model files to tests/Inputs/Models.
>>  
>> 
>> Overall this is looking good.  I think the explanatory comments will really 
>> help people understand what this is doing, and I think changing how we 
>> thread the information through FrontendAction will help not introduce an 
>> artificial tainting of FrontendAction with concepts specific to the static 
>> analyzer.
>> 
>> Cheers,
>> Ted
>> 
>> 
>> Thanks,
>> Gábor
>>  
>> 
>> On Jul 16, 2014, at 2:45 AM, Gábor Horváth <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> 
>>> 
>>> 
>>> On 14 July 2014 19:32, Anna Zaks <[email protected] <mailto:[email protected]>> 
>>> wrote:
>>> 
>>>> On Jul 13, 2014, at 6:11 AM, Gábor Horváth <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi Anna,
>>>> 
>>>> Thank you for the review. I have tweaked the test, so it no longer 
>>>> requires the error reporting tweak that is not done yet to pass. I have 
>>>> also added some high level comments to some files, if you think some 
>>>> information is lacking I will add them in the next iteration as well. The 
>>>> BugReporter patch is now separated into a different patch. 
>>>> 
>>>> 
>>>> On 11 July 2014 18:02, Anna Zaks <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> For example, modeling functions should allow you to find bugs and suppress 
>>>> false positives outside of those functions. I would suggest adding a few 
>>>> of those tests first.
>>>> 
>>>> 
>>>> How are the false positives suppressed? I did not find any resource on 
>>>> that. Found some analyzer attributes but I did not find them suitable for 
>>>> this purpuse at the first glance. But I think once the locations that are 
>>>> in a model file are omitted from the report path, the regular methods for 
>>>> suppressing false positives should work (and I will definitely add test 
>>>> case to ensure this once it is done).
>>>> 
>>> 
>>> What I meant is that it is possible to construct a test where ability to 
>>> model a function would eliminate a false positive. This would be another 
>>> way to test your patch without worrying about BugReporter.
>>> 
>>> I got it now, thansk. I have updated the patch with a test case where a 
>>> false positive case is eliminated by a model file.
>>> 
>>> Thanks,
>>> Gábor
>>>  
>>>> Thanks,
>>>> Gábor
>>>> <api_modeling.patch><bugreporter.patch>
>>> 
>>> 
>>> <api_modeling.patch><bugreporter.patch>
>> 
>> 
>> <api_modeling.patch><bugreporter.patch>
> 
> 
> <api_modeling.patch><bugreporter.patch>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Modeling APIs in the Static Analyzer

Reply via email to