Re: [PATCH] Modeling APIs in the Static Analyzer

Ted Kremenek Tue, 12 Aug 2014 22:14:09 -0700

Thanks Gábor.  This looks great.

One last thing that occurred to me is that we should not create a ModelInjector 
at all unless the model-path is specified.  This allows us to preserve the 
existing behavior in the analyzer while we continue to evolve this new 
functionality.


Specifically:

 std::unique_ptr<AnalysisASTConsumer>
-ento::CreateAnalysisConsumer(const Preprocessor &pp, const std::string &outDir,
-                             AnalyzerOptionsRef opts,
-                             ArrayRef<std::string> plugins) {
+ento::CreateAnalysisConsumer(CompilerInstance &CI) {
   // Disable the effects of '-Werror' when using the AnalysisConsumer.
-  pp.getDiagnostics().setWarningsAsErrors(false);
+  CI.getPreprocessor().getDiagnostics().setWarningsAsErrors(false);
 
-  return llvm::make_unique<AnalysisConsumer>(pp, outDir, opts, plugins);
+  return llvm::make_unique<AnalysisConsumer>(
+      CI.getPreprocessor(), CI.getFrontendOpts().OutputFile,
+      CI.getAnalyzerOpts(), CI.getFrontendOpts().Plugins,
+      new ModelInjector(CI));
 }
 

We can query 'opts' to see if model-path is empty; if it is we can pass nullptr 
instead of 'new ModelInjector(CI)'.

> On Aug 12, 2014, at 1:22 AM, Gábor Horváth <[email protected]> wrote:
> 
> Hi Ted,
> 
> Thank you for the review, I have altered the patch accordingly, and also 
> added the patch to follow up the API change in clang tidy.
> 
> 
> On 12 August 2014 08:44, Ted Kremenek <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Aug 1, 2014, at 1:44 AM, Gábor Horváth <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Ted,
>> 
>> Thank you for the review.
>> 
>> On 1 August 2014 07:25, Ted Kremenek <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi Gábor,
>> 
>> This is looking good to me.  Some minor nits/comments:
>> 
>> - Please add doxygen comments for the CodeInjector class.
>>  
>> Done.
>>  
>> - For the BugReporter patch, are there tests for that functionality change?  
>> I saw tests in the other patch, but not that one.  It's fine to separate the 
>> review of that change before the primary change goes in, but I was curious.
>> 
>> Well, it may be a bit complicated. I deleted some code in BugReporter to not 
>> to discard bug reports that are in a model file, and the plist part of the 
>> test case only pass if that patch is applied (if the patch is not applied 
>> the nullpointer dereference warning that has the position in the modelfile 
>> will be discarded. In the long term it would be better to report these 
>> errors elsewhere but it is not supported yet by the bugreporter patch). I 
>> can move the plist check into a separate testcase and add that case to the 
>> BugReporter patch instead. The division by zero test should work without the 
>> BugReporter patch.
> 
> Ok, this make sense.  Can you clarify what you mean by "better to report 
> these errors elsewhere"?
> 
> 
> It might be confusing for the user, if the execution path of the bug report 
> contains a location that is inside a model file so the report contains codes 
> and files that are not present in the analysed project. So it might be more 
> user friendly if the locations that are inside a model file would be excluded 
> from the reported execution path, however if this is not an issue I am ok 
> with reporting those locations as well.
> 
>>  
>> - As for breaking code in the 'extra' repository, LLVM-internal API is not 
>> sacrosanct.  If we break the 'extra' projects we just need to update them, 
>> but I'm not certain if that is possible in this case.
>> 
>> As far as I can remember it would be a straightforward fix in the extra 
>> repository. Clang-tidy calls CreateAnalysisConsumer.
> 
> Sounds good.  Let's get the right API and just fix up clang-tidy.
> 
>>  
>> - For comments, please consistently use sentence casing and end with 
>> periods, and for type names use the appropriate casing.  For example:
>> 
>> +  // modules create a separate compilerinstance for parsing modules, maybe 
>> it is
>> +  // for reason so I mimic this behavior
>> +  CompilerInstance Instance;
>> ...
>> 
>> This comment looks a bit suspect, since it seems like a question to 
>> yourself.  Here you use the word "I"; who is "I" in the context of this 
>> code?  The comment also seems like an unanswered question.  Is this a stale 
>> comment?
>> 
>> 
>> Done, the comment was improved.
>>  
>> Another example is this comment:
>> 
>> +  // FIXME: double memoization is redundant. Here and in bodyfarm.
>> +  llvm::StringMap<Stmt *> Bodies;
>> 
>> This can be made slightly cleaner.  For example:
>> 
>> +  // FIXME: Double memorization is redundant, with
>> +  /// memoization both here and in BodyFarm.
>> +  llvm::StringMap<Stmt *> Bodies;
>> 
>> Done.  
>> 
>> - Only use doxygen comments for documentation.  For example:
>> 
>> +  if (notzero_notmodeled(p)) {
>> +   /// There is no information about the value of p, because
>> +   /// notzero_notmodeled is not modeled and the function definition
>> +   /// is not available.
>> +    int j = 5 / p; // expected-warning {{Division by zero}}
>> +  }
>> 
>> In this case we should use '//', not '///'.  The former are true comments, 
>> and the latter are candidates to be extracted for documentation.
>> 
>> 
>> Done.
>>  
>> Overall, however, this is getting really close.
>> 
>> 
>> It is great.
>> 
>> Thanks,
>> Gábor
> 
> Wonderful.  The rest of my comments are minor:
> 
>> +/// \brief CodeInjector is an interface which is responsible forinjecting 
>> AST of
>> +/// function definitions that may not be available in the original source.
>> +///
>> +/// The getBody function will be called each time the static analyzer 
>> examines a
>> +/// function call that has no definition available in the current 
>> translation
>> +/// unit. If the returned statement is not a nullpointer, it is assumed to 
>> be
>> +/// the body of a function which will be used for the analysis. The source 
>> of
>> +/// the body can be arbitrary, but it is advised to use memoization to avoid
>> +/// unnecessary reparsing of the external source that provides the body of 
>> the
>> +/// functions.
> 
> 
>   "forinjecting" -> "for injecting"
>   "nullpointer" -> "null pointer"
> 
>> +++ include/clang/StaticAnalyzer/Frontend/FrontendActions.h (working copy)
>> @@ -10,10 +10,16 @@
>>  #ifndef LLVM_CLANG_GR_FRONTENDACTIONS_H
>>  #define LLVM_CLANG_GR_FRONTENDACTIONS_H
>>  
>> +#include <map>
>> +
> 
> 
> This "#include" of <map> doesn't seem needed.  Neither is the one in 
> ModelConsumer.h
> 
>> +++ lib/StaticAnalyzer/Frontend/ModelConsumer.cpp (working copy)
>> @@ -0,0 +1,42 @@
>> +//===--- ModelConsumer.cpp - ASTConsumer for consuming model files 
>> --------===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +///
>> +/// \file
>> +/// \brief This file implements an ASTConsumer for consuming model files.
>> +///
>> +/// This ASTConsumer handles the AST of a parsed model file. All top level
>> +/// function definitions will be collected from that model file for later
>> +/// retrieval during the static analyzis. The body of these functions will 
>> not
>> +/// be injected into the ASTUnit of the analyzed translation unit. It will 
>> be
>> +/// available through the BodyFarm which is utilized by the 
>> AnalysisDeclContext
>> +/// class.
>> +///
> 
>   "analyzis" -> "analysis"
> 
>> +  // The instance wants to take ownership, however disablefree frontend 
>> option
>> +  // is set to true to avoid double free issues
> 
> Use  the actual casing for the option for technical precision:
> 
>   DisableFree
> 
>> +  /// \brief Synthetize a body for a declaration
>> +  ///
>> +  /// This method first looks up the appropriate model file based on the
>> +  /// model-path configuration option and the name of the declaration that 
>> is
>> +  /// looked up. If no model were synthetized yet for a function with that 
>> name
>> +  /// it will create a new compiler instance to parse the model file using 
>> the
>> +  /// ASTContext, Preprocessor, SourceManager of the original compiler 
>> instance.
>> +  /// The former resources are shared between the two compiler instance, so 
>> the
>> +  /// newly created instance have to "leak" these objects, since they are 
>> owned
>> +  /// by the original instance.
> 
>    Synthetize -> Synthesize
>   synthetized -> synthesized
> 
>> +  std::vector<std::unique_ptr<ASTUnit> > ModelAsts;
> 
> I'd prefer this to be "ModelASTs", as 'AST' is an acronym.
> 
> 
> It was an unused member (from an earlier implementation) that I forgot to 
> remove, but done now.
>  
> Otherwise, this all looks great to me.
> 
>>  
>> Cheers,
>> Ted
>> 
>> On Jul 30, 2014, at 3:29 AM, Gábor Horváth <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> Hi Ted,
>>> 
>>> Thank you for the review.
>>> 
>>> 
>>> On 30 July 2014 08:18, Ted Kremenek <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi Gábor,
>>> 
>>> Thanks for making progress on this very promising enhancement to the 
>>> analyzer.  I have an assortment of comments, in no particular order:
>>> 
>>> - ModelInjector.h and ModelConsumer.h
>>> 
>>> There is a comment at the top of these files, but I think a bit more 
>>> explanation is needed.  For example:
>>> 
>>>   MetaConsumer.cpp:
>>> 
>>>     +// "Meta" ASTConsumer for consuming model files.
>>> 
>>> That doesn't really explain anything.  What does "Meta" in quotes mean?  I 
>>> think an explanation here on what this does is helpful when someone 
>>> discovers this code for the first time.
>>> 
>>> Similarly, we should add some high-level comments for CodeInjector.h and 
>>> ModelInjector.h.  We have a good start in ModelInjector.h:
>>> 
>>> +/// \file
>>> +/// \brief Defines the clang::ento::ModelInjector class which implements 
>>> the
>>> +/// clang::CodeInjector interface. This class is responsible for injecting
>>> +/// function definitions that were synthetized from model files.
>>> +///
>>> 
>>> Let's consider expanding it:
>>> 
>>>  /// \brief This file defines the clang::ento::ModelInjector class which 
>>> implements the
>>>  /// clang::CodeInjector interface. This class is responsible for injecting
>>>  /// function definitions that were synthesized from model files.
>>> 
>>>  /// Model files allow definitions of functions to be lazily constituted 
>>> for functions
>>>  /// which lack bodies in the original source code.  This allows the 
>>> analyzer
>>>  /// to more precisely analyze code that calls such functions, analyzing the
>>>  /// artificial definitions (which typically approximate the semantics of 
>>> the
>>>  /// called function) when called by client code.  These definitions are
>>>  /// reconstituted lazily, on-demand, by the static analyzer engine.
>>> 
>>> CodeInjector.h provides some information, but it is a bit vague:
>>> 
>>> +///
>>> +/// \file
>>> +/// \brief Defines the clang::CodeInjector interface which is responsible 
>>> for
>>> +/// injecting AST of function definitions from external source.
>>> +///
>>> 
>>> It's a bit unclear how this gets used.  I think a bit of prose here would 
>>> help clarify its role in the static analyzer.  I also think the 
>>> CodeInjector interface is also more abstract than the prose describes.  
>>> There's nothing about CodeInjector's interface that requires the injected 
>>> definitions to come from an external source.  That's an implementation 
>>> detail of a concrete subclass.  Instead, all CodeInjector does is provide 
>>> an interface that lazily provides definitions for functions and methods 
>>> that may not be present in the original source.
>>> 
>>> I have added some further documentation to address these issues.
>>> 
>>>  
>>> 
>>> I'm also looking at the change to 
>>> StaticAnalyzer/Frontend/FrontendActions.cpp, and wonder if we can simplify 
>>> things:
>>> 
>>>> +++ lib/StaticAnalyzer/Frontend/FrontendActions.cpp (working copy)
>>>> @@ -7,9 +7,11 @@
>>>>  //
>>>>  
>>>> //===----------------------------------------------------------------------===//
>>>>  
>>>> +#include "clang/Frontend/CompilerInstance.h"
>>>>  #include "clang/StaticAnalyzer/Frontend/FrontendActions.h"
>>>> -#include "clang/Frontend/CompilerInstance.h"
>>>>  #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
>>>> +#include "clang/StaticAnalyzer/Frontend/ModelConsumer.h"
>>>> +#include "ModelInjector.h"
>>>>  using namespace clang;
>>>>  using namespace ento;
>>>>  
>>>> @@ -18,6 +20,14 @@
>>>>    return CreateAnalysisConsumer(CI.getPreprocessor(),
>>>>                                  CI.getFrontendOpts().OutputFile,
>>>>                                  CI.getAnalyzerOpts(),
>>>> -                                CI.getFrontendOpts().Plugins);
>>>> +                                CI.getFrontendOpts().Plugins,
>>>> +                                new ModelInjector(CI));
>>>>  }
>>>>  
>>> 
>>> 
>>> It looks like CreateAnalysisConsumer just continues to grow more arguments, 
>>> all which derive from using 'CI'.  This seems silly, since this function is 
>>> called in one place.  Instead of intro ducting a dependency on 
>>> ModelInjector.h in this file, we can just sink these arguments into 
>>> CreateAnalysisConsumer() itself, resulting in:
>>> 
>>>   return CreateAnalysisConsumer(CI);
>>> 
>>> and let CreateAnalysisConsumer() do all that boilerplate.
>>> 
>>> That was my original idea as well but it broke the compilation of some code 
>>> in extra repository and I wasn't sure if it is ok to break the API with 
>>> this patch. But I find it cleaner this way so I modified it in this 
>>> iteration.
>>> 
>>> 
>>> Next, let's look at the change to FrontendAction:
>>> 
>>>>  class FrontendAction {
>>>> +  /// Is this action invoked on a model file? Model files are incomplete
>>>> +  /// translation units that relies on type information from another 
>>>> translation
>>>> +  /// unit. Check ParseModelFileAction for details.
>>>> +  bool ModelFile;
>>> 
>>> Perhaps "IsModelFile"?  "ModelFile" sounds like it should be a reference to 
>>> the file itself.
>>> 
>>>>    FrontendInputFile CurrentInput;
>>>>    std::unique_ptr<ASTUnit> CurrentASTUnit;
>>>>    CompilerInstance *Instance;
>>>> @@ -105,7 +109,11 @@
>>>>    /// @}
>>>>  
>>>>  public:
>>>> -  FrontendAction();
>>>> +  /// \brief Constructor
>>>> +  ///
>>>> +  /// \param modelFile determines whether the source files this action 
>>>> invoked
>>>> +  /// on should be treated as a model file. Defaults to false.
>>>> +  FrontendAction(bool modelFile = false);
>>> 
>>> It seems suboptimal to modify the interface of FrontendAction just for this 
>>> one edge case.  Instead of modifying the constructor arguments, we could 
>>> default initialize "IsModelFile" to false, and have a setter to change it.  
>>> For example:
>>> 
>>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *> 
>>> &Bodies)
>>>     : ASTFrontendAction(/*ModelFile=*/true), Bodies(Bodies) {}
>>> 
>>> becomes:
>>>  
>>>   ParseModelFileAction::ParseModelFileAction(llvm::StringMap<Stmt *> 
>>> &Bodies)
>>>     : Bodies(Bodies)  {
>>>     IsModelFile = true;
>>>   }
>>> 
>>> Looking at this more, I wonder if we should modify FrontendAction at all.  
>>> The only place where isModelParsingAction() is called is in one spot in 
>>> CompilerInstance.cpp:
>>> 
>>>    if (hasSourceManager() && !Act.isModelParsingAction())
>>> 
>>> It *might* be cleaner to just have a virtual member function in 
>>> FrontendAction, which defaults to returning false, but is generic for all 
>>> subclasses to override.  Then we don't need the "IsModelFile" field in 
>>> FrontendAction at all, and we just have ParseModelFileAction override that 
>>> single member function.  We could then name that method to be something a 
>>> bit more generic.  That would allow us to not touch FrontendAction at all 
>>> except for providing that single virtual method that can be overridden in 
>>> subclasses.  I somewhat prefer this approach because it provides a cleaner 
>>> separation of concerns between FrontendAction (which is defined 
>>> libFrontend) and the static analyzer.  That would also allow you to get rid 
>>> of isModelParsingAction() entirely (replacing it with something more 
>>> generic).
>>> 
>>> 
>>> You are right, it is much cleaner to use a virtual function, so I modified 
>>> the patch to use that approach. The new virtual function has the same name 
>>> because I have yet to find any better and more general name yet. Do you 
>>> have an idea for a better name?
>>>  
>>> As for the test case:
>>> 
>>>> +typedef int* intptr;
>>>> +
>>>> +void modelled(intptr p);
>>>> +
>>>> +int main() {
>>>> + modelled(0);
>>>> + return 0;
>>>> +}
>>> 
>>> Please add some comments in this test file explaining what is happening.  
>>> Also, it would be great if this both used FileCheck (which it does now) but 
>>> also verified the diagnostics so we get cross-checking of the output (we 
>>> see this in some analyzer tests).  It also makes it easier to understand 
>>> the test.
>>> 
>>> Also, is there a reason to break up the tests between 
>>> model-suppress-falsepos.cpp and model-file.cpp?  It seems like one test 
>>> file will do fine; just clearly comment on what is happening for each test. 
>>>  I also recommend called the modeled function "modeledFunction" instead of 
>>> "modelled" (which according to my spell checker has an additional 'l'). 
>>> 
>>> I have merged the test files and also added some commets to explain what is 
>>> going on. I have fixed the misspelling as well. The nullpointer dereference 
>>> is only checked through plist because the point where the comment with the 
>>> expected warning should be added is inside the model file and it did not 
>>> work for me if the comment was in a separate file. If there is a different 
>>> way to verify the warnings that are in a separate file and I did not find 
>>> it, please let me know.
>>>  
>>> 
>>> As for the model files themselves:
>>> 
>>>> Index: test/Analysis/modelled.model
>>>> ===================================================================
>>>> --- test/Analysis/modelled.model  (revision 0)
>>>> +++ test/Analysis/modelled.model  (working copy)
>>>> @@ -0,0 +1,3 @@
>>>> +void modelled(intptr p) {
>>>> + ++*p;
>>>> +}
>>>> \ No newline at end of file
>>>> Index: test/Analysis/notzero.model
>>>> ===================================================================
>>>> --- test/Analysis/notzero.model (revision 0)
>>>> +++ test/Analysis/notzero.model (working copy)
>>> 
>>> Let's put these in a separate subdirectory, for example, "models", instead 
>>> of mixing them with the tests.  This way they really serve as "inputs" to 
>>> the analyzer.
>>> 
>>> I have moved the model files to tests/Inputs/Models.
>>>  
>>> 
>>> Overall this is looking good.  I think the explanatory comments will really 
>>> help people understand what this is doing, and I think changing how we 
>>> thread the information through FrontendAction will help not introduce an 
>>> artificial tainting of FrontendAction with concepts specific to the static 
>>> analyzer.
>>> 
>>> Cheers,
>>> Ted
>>> 
>>> 
>>> Thanks,
>>> Gábor
>>>  
>>> 
>>> On Jul 16, 2014, at 2:45 AM, Gábor Horváth <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> 
>>>> 
>>>> 
>>>> On 14 July 2014 19:32, Anna Zaks <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>>> On Jul 13, 2014, at 6:11 AM, Gábor Horváth <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hi Anna,
>>>>> 
>>>>> Thank you for the review. I have tweaked the test, so it no longer 
>>>>> requires the error reporting tweak that is not done yet to pass. I have 
>>>>> also added some high level comments to some files, if you think some 
>>>>> information is lacking I will add them in the next iteration as well. The 
>>>>> BugReporter patch is now separated into a different patch. 
>>>>> 
>>>>> 
>>>>> On 11 July 2014 18:02, Anna Zaks <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> For example, modeling functions should allow you to find bugs and 
>>>>> suppress false positives outside of those functions. I would suggest 
>>>>> adding a few of those tests first.
>>>>> 
>>>>> 
>>>>> How are the false positives suppressed? I did not find any resource on 
>>>>> that. Found some analyzer attributes but I did not find them suitable for 
>>>>> this purpuse at the first glance. But I think once the locations that are 
>>>>> in a model file are omitted from the report path, the regular methods for 
>>>>> suppressing false positives should work (and I will definitely add test 
>>>>> case to ensure this once it is done).
>>>>> 
>>>> 
>>>> What I meant is that it is possible to construct a test where ability to 
>>>> model a function would eliminate a false positive. This would be another 
>>>> way to test your patch without worrying about BugReporter.
>>>> 
>>>> I got it now, thansk. I have updated the patch with a test case where a 
>>>> false positive case is eliminated by a model file.
>>>> 
>>>> Thanks,
>>>> Gábor
>>>>  
>>>>> Thanks,
>>>>> Gábor
>>>>> <api_modeling.patch><bugreporter.patch>
>>>> 
>>>> 
>>>> <api_modeling.patch><bugreporter.patch>
>>> 
>>> 
>>> <api_modeling.patch><bugreporter.patch>
>> 
>> 
>> <api_modeling.patch><bugreporter.patch>
> 
> 
> 
> Thanks,
> Gábor
> <api_modeling.patch><bugreporter.patch><clangTidy.patch>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] Modeling APIs in the Static Analyzer

Reply via email to