gamesh411 marked 2 inline comments as done.
gamesh411 added a comment.

The remaining documentation and test changes are also underway.



================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:227
+    /// Identifier.
+    virtual LoadResultTy load(StringRef Identifier) = 0;
+    virtual ~ASTLoader() = default;
----------------
martong wrote:
> xazax.hun wrote:
> > martong wrote:
> > > xazax.hun wrote:
> > > > I am not sure if this is good design.
> > > > Here, if the meaning of the `Identifier` depends on the subclass, the 
> > > > caller of this method always needs to be aware of the dynamic type of 
> > > > the object. What is the point of a common base class if we always need 
> > > > to know the dynamic type?
> > > > 
> > > > Looking at the code this does not look bad. But it might be a code 
> > > > smell.
> > > The way how we process the `extDefMapping` file is identical in both 
> > > cases. That's an index, keyed with the `USR`s of functions and then we 
> > > get back a value. And the way how we use that value is different. In the 
> > > PCH case that holds the path for the `.ast` file, in the ODM case that is 
> > > the name of the source file which we must find in the compile db. So, I 
> > > think the process of getting the AST for a USR requires the polymorphic 
> > > behavior from the loaders.
> > > 
> > > We discussed other alternatives with Endre. We were thinking that maybe 
> > > the `extDefMapping` file should be identical in both cases. But then we 
> > > would need to add the `.ast` postfixes for the entries in the PCH case. 
> > > And we cannot just do that, because we may not know if what is the 
> > > correct postfix. The user may have generated `.pch` files instead. Also, 
> > > we don't want to compel any Clang user to use CodeChecker (CC will always 
> > > create `.ast` files). CTU should be running fine by manually executing 
> > > the independent steps.
> > Let me rephrase my concerns a bit. Do we really need a polymorphic 
> > `ASTLoader` to be present for the whole analysis? Wouldn't it make more 
> > sense to always do the same thing, i.e. if we are given a pch file load it, 
> > if we are given a source file, parse it? This way we would not be 
> > restricted to on-demand or two pass ctu analysis, but we could do any 
> > combination of the two.
> > 
> Well yeah, we could do that, it is a good idea, thanks! We will consider this 
> in the future. I like in this idea that the command line options to Clang 
> would be simplified. But then we must be transparent and show/log the user 
> which method we are using.
I will implement this separation by suffixes in the next change.


================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:285
   public:
-    ASTUnitStorage(const CompilerInstance &CI);
+    ASTUnitStorage(CompilerInstance &CI);
     /// Loads an ASTUnit for a function.
----------------
xazax.hun wrote:
> Why is this no longer const?
Most usages of the CI require a non-const ref (this is due to lacking API 
support for getting analyzer options), and I found it more convenient to just 
take a non-const ref, instead of const_casting it away later. Not sure myself, 
but I think I will roll with this now (and I should definitely check the API of 
CompilerInstance).


================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:234
+  public:
+    explicit ASTFileLoader(CompilerInstance &CI, StringRef CTUDir);
+
----------------
whisperity wrote:
> Is the `explicit` needed here?
This constructor started out with having a single param only. I agree, that 
`explicit` is no longer necessary.


================
Comment at: clang/include/clang/CrossTU/CrossTranslationUnit.h:260
+    /// that produce the AST used for analysis.
+    StringRef OnDemandParsingDatabase;
+
----------------
martong wrote:
> Should we rename this member to ....PathTo....InvocationList... ?
Renamed to `InvocationListFilePath`.


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:122-123
       return "Load threshold reached";
+    case index_error_code::ambiguous_compilation_database:
+      return "Compilation database contains multiple references to the same "
+             "source file.";
----------------
whisperity wrote:
> still using terms "compilation database", is this intended?
renamed


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:362
 
-  return ASTUnit::LoadFromASTFile(
-      std::string(ASTFilePath), CI.getPCHContainerOperations()->getRawReader(),
-      ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());
+  auto LoadFromFile = [this](StringRef Path) {
+    IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts = new DiagnosticOptions();
----------------
balazske wrote:
> Is here a lambda necessary? I think it must not be over-used. In this case we 
> can construct the file path as the first step and then use it to load the 
> file, without using lambda call.
Refactored to use path concatenation.


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:581
+  SmallVector<const char *, 32> CommandLineArgs(InvocationCommand.size());
+  std::transform(InvocationCommand.begin(), InvocationCommand.end(),
+                 CommandLineArgs.begin(),
----------------
martong wrote:
> Could we avoid this transfer if InvocationList was storing `const char *` 
> values instead of std::strings? Why can't we store `char*`s in InvocationList?
We certainly could, I think there won't be null-characters in paths so storing 
char* s would not lose generality. However, I think the memory management done 
by std::string is worth it, as it is clear where the character-data lives in 
memory.


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:583
+                 CommandLineArgs.begin(),
+                 [](auto &&CmdPart) { return CmdPart.c_str(); });
+
----------------
whisperity wrote:
> balazske wrote:
> > Is here a special reason to use `auto &&` type (and not `auto &` or 
> > `std::string &`)? Probably this code is faulty: The `CmdPart` is moved out 
> > and then destructed, before the content of it (where the `c_str` points to) 
> > is used?
> @balazske This is a generic lambda where `auto&&` is the universal reference. 
> It is equivalent to writing `template <typename T> f(T&& param)`. It obeys 
> the usual point of instantiation and reference cascading rules, if a `const 
> std::string&` is given as `T`, `const std::string& &&` will become a single 
> `&`.
Results of my research: In theory, the STL could do something fancy here if it 
wanted, but it is not the case in reality. If you want move-operations with 
algorithms (the ones not explicitly mentioning move in their names) you have to 
be explicit about it: [[ https://www.fluentcpp.com/2017/04/25/move-iterators/ | 
move-iterators-fluent-cpp ]]. 


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:609
+  llvm::SourceMgr SM;
+  llvm::yaml::Stream InvocationFiles(*ContentBuffer, SM);
+
----------------
whisperity wrote:
> Are the headers from which these types come from included?
Included the llvm/Support/YAMLParser.h


================
Comment at: clang/lib/CrossTU/CrossTranslationUnit.cpp:120
+    case index_error_code::ambiguous_compile_commands_database:
+      return "Compile commands database contains multiple references to the "
+             "same source file.";
----------------
whisperity wrote:
> martong wrote:
> > xazax.hun wrote:
> > > Unfortunately, this is a very common case for a large number of projects 
> > > that supports multiple targets (e.g. 32 and 64 bits). Is there a plan to 
> > > mitigate this problem?
> > I don't think we could do that. We need to merge ASTs of those TUs that are 
> > linkable. Otherwise the merge will be unsuccessful because of certain 
> > mismatches in Decls (structural eq will fail).
> > Even in CodeChecker we are planning to issue a hard error in these 
> > ambiguous cases, even with the PCH method too. (The error would be 
> > suppressible by the user of course, but if that is suppressed then we take 
> > no responsibility.)
> More clever logging of link commands and creating separate CDBs for each 
> target or each binary could help this cause, but that needs infrastructural 
> capabilities from both the build system and the analysis driver (i.e. 
> CodeChecker).
I have solved the problem of supporting multiple architectures by offloading 
the responsibility to the user (to the tool driving the analysis). The 
invocation list file is now supposed to contain only invocations with the same 
arch/target and this should also be the same as the arch/target of the main TU 
analyzed. If there is still ambiguity after this separation, this error should 
nevertheless be emitted ( IMHO ).


================
Comment at: clang/test/Analysis/Inputs/ctu-other.c:34-38
+// TODO: Support the GNU extension asm keyword as well.
+// Example using the GNU extension: asm("mov $42, %0" : "=r"(res));
 int inlineAsm() {
   int res;
+  __asm__("mov $42, %0"
----------------
whisperity wrote:
> Possibly unrelated change?
Funny enough, this is very relevant. Dumping the AST does not errors out on 
this construction, however, the parser in on-demand mode does. This __asm__ 
syntax is however supported by both. To document this, I have added a TODO as 
well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75665/new/

https://reviews.llvm.org/D75665



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to