donat.nagy requested changes to this revision.
donat.nagy added a comment.
This revision now requires changes to proceed.

I reviewed this change and collected my suggestions in comments.

In general, I feel that the code added by this commit is "sloppy" in the sense 
that it's designed for the common case and would behave unpredictably in 
unusual situations. This is a fault of the toolbox provided by the analyzer: in 
a saner world, you could freely combine the API functions (as you did) and 
expect that they would "just work" together; but in reality every function has 
its quirks and limitations, so the developer must understand the details of the 
implementation.



================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:189-190
+        ASTCtx.getCharWidth();
+    const NonLoc MROffset =
+        SVB.makeArrayIndex(MR->getAsOffset().getOffset() / ElemSizeInBits);
 
----------------
danix800 wrote:
> steakhal wrote:
> > What implies that `MR->getAsOffset()` succeeds, and also what implies that 
> > the `Offset` within that is not symbolic?
> > Also, how can you use this without also using the result region?
> > Without using that you don't know what is the anchor point, from which this 
> > offset represent anything.
> > ATM I believe the code assumes that `MR->getRegion()` equals to 
> > `SuperRegion`, which might not be always the case.
> > This could materialize a problem when you construct the element region 
> > later.
> I'll restrict the checker to handle non-symbolic offset only.
> ATM I believe the code assumes that MR->getRegion() equals to SuperRegion, 
> which might not be always the case.
This remark of @steakhal highlights a problem which is still present in your 
code: when you calculate `SuperRegion` you strip a single `ElementRegion` 
layer; while the offset that you get via `getDynamicElementCountWithOffset()` 
is calculated from `MemRegion::getAsOffset()` which can strip multiple 
`ElementRegion`, `FieldRegion` etc. layers.

Unfortunately the memory region API of Clang Static Analyzer is very 
inconsistent and quirky; you must dig deep into the implementation to 
understand the exact meaning of these function calls. 

For example, if you have a data type like
```lang=c
struct ReqStruct {
    int custom_info;
    MPI_Request reqs[8];
} rs;
```
then the `SuperRegion` will refer to the immediate parent of the 
`ElementRegion` (which is presumably the `FieldRegion` corresponding to 
`reqs`); but the offset value is measured from the start of the `VarRegion` 
corresponding to the variable `rs` (and includes `sizeof(int)` + potential 
padding due to the presence of `custom_info`).

I think the correct solution would measure the offset from the start of the 
array of `MPI_Request` objects; because if you measure it from the start of the 
struct, then you may encounter serious issues, e.g. there is no guarantee that 
the offset will be divisible by the size of `MPI_Request`.

Unfortunately this would mean that you cannot rely on the logic of 
`getDynamicElementCountWithOffset()`;  and I don't see a clear way to 
generalize that library function to limit the number/type of region layers that 
it strips.

Here I don't see a clear way forward, as the two potential solutions are a deep 
rewrite that complicates the library code and a creating a locally tweaked 
duplicate of the library logic -- and these both have serious drawbacks.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:146
 
+static std::optional<std::pair<NonLoc, llvm::APSInt>>
+getRequestRegionOffsetAndCount(const MemRegion *const MR, const CallEvent &CE) 
{
----------------
Use `long long` (or `int64_t`, `ssize_t`, whatever you prefer) instead of 
`NonLoc` here because it's easier to handle them (and you don't work with 
symbolic offsets anyway).


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:174
+  return std::make_pair(
+      SVB.makeArrayIndex(RequestRegionOffset.getOffset() / TypeSizeInBits),
+      RequestRegionCount->getValue());
----------------
What happens here in the case when `TypeSizeInChars.isZero()` holds?

I see that you used a ternary operator to avoid division by zero; but do you 
get a meaningful and valid offset value in the case when `MPI_Request` has zero 
size and you use `TypeSizeInBits == 8`?

I understand that the divisior does not matter in the case when 
`RequestRegionOffset.getOffset()` is zero; but if I understand it correctly 
this offset value can be nonzero even in the case when `MPI_Request` has zero 
size. (Under the hood it calls `MemRegion::getAsOffset()` which may include the 
offset of e.g. a data member of an object.)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:213
+    // StdCLibraryFunctions
+    for (size_t i = 0; i < RegionCount && i < RequestedCount; ++i) {
+      auto RegionIndex =
----------------
Paranoid remark: if these count values are both very large (e.g. negative 
numbers converted to `size_t`), then this loop could hang for a long time 
and/or exhaust the memory. Consider adding a hard limit on the number of 
iterations.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/MPI-Checker/MPIChecker.cpp:215-217
+          SVB.evalBinOp(Ctx.getState(), BO_Add, SVB.makeArrayIndex(i),
+                        RegionOffset, SVB.getArrayIndexType())
+              .getAs<NonLoc>();
----------------
If you represent `RegionOffset` as a `long long`, you can replace this 
complicated `evalBinOp` with a simple `SVB.makeArrayIndex(i + RegionOffset);`.

Note: @steakhal already suggested this improvement in the earlier remark
> My guess is that we only care about if MROffset is a concrete int, thus we 
> could also just do this assidion in regular c++ and just transfer the result 
> into the symbolic domain.
which referred to an older variant of your patch.



================
Comment at: clang/test/Analysis/mpichecker.cpp:276-280
+  typedef struct {
+    MPI_Request req[3];
+    MPI_Request req2;
+  } ReqStruct;
+
----------------
Please add a dummy field above the other fields to verify that the offsets are 
handled correctly. For example, `char request_name[99];` would be sufficiently 
large to highlight errors.

This change could be applied in other tests as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158813/new/

https://reviews.llvm.org/D158813

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to