[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

Csaba Dabis via Phabricator via cfe-commits Fri, 06 Dec 2019 16:17:45 -0800

Charusso added a comment.

Please let us measure your examples at first to have a consensus what we would 
like to see from this checker.

In D70411#1769803 <https://reviews.llvm.org/D70411#1769803>, @NoQ wrote:

> In D70411#1769628 <https://reviews.llvm.org/D70411#1769628>, @Charusso wrote:
>
> > F10966837: str31-c.tar.gz <https://reviews.llvm.org/F10966837>
>
>
> These reports seem to confirm my point. I took a look at first 5 and in all 
> of them the code does in fact ensure that the buffer space is sufficient, but 
> in 5 different ways. You most likely won't be able to enumerate all possible 
> ways in which the user can ensure that the destination buffer size is 
> sufficient.

I believe I can enhance the checker with the help of `NoteTags` and give more 
information for the user why the buffer space is Not sufficient. The size 
property's upper-bound is not that difficult to measure, I think we can 
implement this. For now, let me analyze each report.

---

> F10967274: content_encoding.c_5ae4f30ce29f14441139e7bbe20eeaaa.plist.html 
> <https://reviews.llvm.org/F10967274>
>  The source string is taken from a global table and therefore has known 
> maximum size.
>  F10967280: imap.c_fd80e0804acd9e7ecb9c2483151625a9.plist.html 
> <https://reviews.llvm.org/F10967280>
>  The source string is a literal `'*'`.

When you encounter a member that is when the hand-waving begins. I have made a 
dump to see what we have. I have not seen any immediate way to connect the 
dynamic size and the members. For now we have that:
The source region is:
`SymRegion{reg_$36<const char * SymRegion{derived_$35{conj_$32{int, LC7, 
S161233, #1},Element{encodings,0 S64b,const struct content_encoding_s 
*}}}.name>}`
and the allocation's dynamic size is:
`extent_$41{SymRegion{conj_$34{void *, LC7, S161233, #1}}}`

My assumption is that, the size of a member is very arbitrary and we cannot 
track the member's allocation, so we even could drop every of these reports as 
being useless. Because I make this checker alpha I am fine with that assumption 
of members, and we definitely need better ideas.

---

> F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html 
> <https://reviews.llvm.org/F10967277>
>  The source string is an IP address and port, which has a known limit on the 
> number of digits it can have.

The known size does not mean that the string is going to be null-terminated. 
For example the VIM project has one single global macro to serve every 
allocation's size, and that is why the CERT rule made. If the attacker passes a 
10k characters long string and you allocate 10k+1 characters for the `'\0'` you 
can work with the data safely. Of course you would need to check some upper 
bound, but that is another story. The key is, any arbitrary buffer size could 
be a problem. Here:

  954 | size_t addrlen = INET6_ADDRSTRLEN > strlen(string_ftpport) // Assuming 
the condition is true
                         ? INET6_ADDRSTRLEN
                         : strlen(string_ftpport)

`addrlen` is an arbitrary macro size `INET6_ADDRSTRLEN`, which not necessarily 
could hold `string_ftpport` in `strcpy(addr, string_ftpport);`. As a 
human-being you know the most appropriate size, yes. But the program could be 
ill-formed very easily so we warn on the arbitrary-length path. Here you do not 
see something like `addr[INET6_ADDRSTRLEN - 1] = '\0'` so when the string is 
read you cannot be sure whether it is null-terminated. If line 954 would be 
evaluated to false, allocating `calloc(addrlen + 1, 1)` so when `addrlen` is 
`strlen(string_ftpport)` is fine of course, that is the appropriate size to 
store `string_ftpport`.

---

> F10967295: tftp.c_eee38be8d783d25f2e733fa3740d13fc.plist.html 
> <https://reviews.llvm.org/F10967295>
>  There's an if-statement that checks that the storage size is sufficient.

Well, this is the most suspicious report of all the time. Let me reconstruct 
what is going on:

  513 | sbytes += tftp_option_add(state, sbytes,
                                  (char *)state->spacket.data + sbytes, // 
warning
                                  TFTP_OPTION_TSIZE);

My guess was that to change the `check::PostCall` to `check::PreCall`, but if I 
remember right the same report occurred. Here we return from the function call 
`tftp_option_add`, then we warn on its argument which already has been passed 
to the call. We should not warn when the array is passed to a function which we 
already have finished reading from. I have not seen any immediate solution, so 
I count this as an internal ~~bug~~ feature.

---

> F10967300: tool_dirhie.c_87dd00a845a927c9b8ed587c6b314af1.plist.html 
> <https://reviews.llvm.org/F10967300>
>  The source string is a token (obtained via `strtok`) of a string that has 
> the same size as the destination buffer.

Let me simplify the report as there is no such feature:

  111 | outlen = strlen(outfile);
  112 | outdup = strdup(outfile);
  116 | dirbuildup = malloc(outlen + 1);

  125 | tempdir = strtok(outdup, PATH_DELIMITERS);
  136 | if(outdup == tempdir)
  138 |   strcpy(dirbuildup, tempdir);

Here we see both `tempdir`'s and `dirbuildup`'s size is based on 
`strlen(outfile)` but `strtok()` strikes in and we conjure a new symbol for 
`dirbuildup` so the connection is lost unfortunately.

---

---

Please let us brainstorming here.

In D70411#1769786 <https://reviews.llvm.org/D70411#1769786>, @NoQ wrote:

> A while back I had a look into implementing some of the CERT checkers and my 
> vague recollection from these attempts was that it's a bad idea to take CERT 
> examples literally. Most of the CERT rules are explained in an informal 
> natural language and it makes CERT great for demonstrating common mistakes 
> that programmers make when they write in C/C++. Humans can easily understand 
> the problem by reading CERT pages. But it doesn't necessarily mean that 
> static analysis tools can be built by generalizing over the examples from the 
> CERT wiki.

Well, if the member's allocation and the expression-value-tracking would not be 
hand-waving, such a simple checker could serve a lot. I see your point and you 
are right the rules are not made for tools. Our tool already knows most of the 
functionality what is needed for the STR rules. It is started as an experiment 
and it is working enough well, in my point of view. The member-value-tracking 
and the lack of standard function modeling of course affects this checker, but 
that issue affects every other checker as well. If we call them non-alpha, it 
could be even non-alpha.

> So i suggest that we make one more step back and agree upon what exactly are 
> we checking here. I.e., basically, agree on the tests. Because right now it 
> looks like you're trying to blindly generalize over the examples: //"The CERT 
> example has a `strcpy()` into a fixed-size buffer, let's warn on all 
> `strcpy()`s into fixed-size buffers!"//.

Just for clarification, I made the entire `MemoryBlockRegion` stuff because I 
do not care how do you allocate a memory block (even a `string` is counted as 
that).

> This way your check does indeed pass the tests, but it doesn't make sense 
> both from the rule's perspective (the rule never said that it's wrong to 
> `strcpy()` into a fixed-size buffer, it only said that you should anyhow 
> guarantee that the storage is sufficient) and from the user's perspective 
> (you won't be able to reduce the gap between false positives and false 
> negatives enough for the check to be useful, even with the subsequent 
> whack-a-mole of adding more heuristics, because what the check is checking is 
> relatively orthogonal to the actual problem you're trying to solve).

There is not that much heuristic:

- If you use the unsafe calls and before reading the string you manage to 
null-terminate it (inject `'\0'`), it would be a false positive.
- If you have a blob of memory without considering the length of the stuff it 
will store, it is a true positive.

Please note that, truncation is fine:

> To prevent such errors, either limit copies through truncation or, 
> preferably, ensure that the destination is of sufficient size

~ from the rule's page 
<https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator>

> For example, in the example titled "Noncompliant Code Example (argv)", it is 
> explicitly stated that the problem is not only that the buffer is fixed-size 
> and `strcpy()` is made, but also that the original string //is controlled by 
> the attacker//. The right analysis to catch such issues is //taint 
> analysis//. It's a typical taint problem. I honestly believe you should try 
> to solve it by combining the taint checker with the string checker: "if 
> string length metadata is tainted (in particular, if the string itself is 
> tainted) and is potentially larger than the destination buffer size (eg., 
> unconstrained), warn". With the recent advancements in the development of the 
> taint checker, i think you can get pretty far this way without constantly 
> struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do 
not have a non-alpha variant, I am fine to call every non-null-terminated 
string reading an issue (as it is), whatever is the source. That is a 
generalization, yes, in the best possible manner what we have got today. Also I 
am not a big fan of relying on non-alpha stuff and I am sure I have defined my 
checker enough well to catch real issues.

---

---

Let me summarize:

- There is a lot more to do, we have seen too many issues to call this a 
finished checker.
- The problem is not the size, but the missing `'\0'` (which you can have 
multiple of at any point).
- We heavily need to swap the value-tracking with `NoteTags` to make this 
useful.

================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:33-34
+  // string, so that we could see whether the string is null-terminated on
+  // every path or not. Because the 'src' could be not null-terminated this
+  // report is fine, but if 'src' is null-terminated this report is a false
+  // positive because the 'src' fits into 'dest' with the null terminator
----------------
NoQ wrote:
> Mmm, no, it's not fine. The warning is saying something that's not correct. 
> Even if `src` is not null-terminated, under the assumption that no 
> out-of-bounds read UB occurs in the `strcpy()` call, `dest` is going to 
> always be null-terminated and have a `strlen` of at most 127. And by 
> continuing our analysis after `strcpy()`, we inform the user that we assume 
> that no UB has happened on the current execution path yet.
> 
> Traditionally checkers either warn immediately when they can detect an error, 
> or assume that the error has not happened. For example, when we dereference a 
> pointer, if the pointer is not known to be null, we actively assume that it's 
> non-null. Similarly, if the `src` string is not null-terminated, we should 
> warn at the invocation of `strcpy`; if it's not known whether it's 
> null-terminated or not, we should actively assume that it's null-terminated 
> at `strcpy`.
> 
> Also, I usually don't agree with the statements like "This report helped us 
> find a real bug, therefore it's a true positive". You can find a bug in 
> literally any code if you stare at it long enough! I'm glad you found the bug 
> anyway, but if the report is not pointing at the problem directly, we're not 
> doing a great job.
Well, I think we could check the range info and catch if we have an infeasible 
case. For now on one of the paths it looks like the string is not 
null-terminated, where we warn. Of course we need better modeling, that is why 
I have stated out with a FIXME.

I totally agree with hand-waving "true positives", but this is very unlikely to 
happen, I just came out with that example to state out the 
`CStringModelingChecker` is not enough in its current shape.

================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning@-1 {{'dest' is not null-terminated}}
----------------
NoQ wrote:
> Looks like a false positive. Consider the following perfectly correct (even 
> if a bit inefficient) code that only differs from the current code only in 
> names and context:
> ```lang=c++
> void copy_and_test_if_short(const char *src) {
>   char dest[128];
>   strcpy(dest, src);
>   warn_if_starts_with_foo(dest);
> }
> 
> void copy_and_test(const char *src) {
>   if (strlen(src) < 64)
>     copy_and_test_if_short(src);
>   else
>     copy_and_test_if_long(src);
> }
> ```
That is why I have asked whether we have a way to say "when the string is 
read". I like that heuristic for now, because any kind of not null-terminated 
string is the root of the evil for [[ 
https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
 | STR31-C ]].

In this case the attacker is the programmer who sends a possibly not 
null-terminated string around and the function which receives an arbitrary 
input would carry the issue.

Thanks for the example, I wanted to cover ranges, but I have forgotten it, 
because in the wild peoples do not check for bounds. Copy-pasted.

================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:22
+  if (gets(buf)) {}
+  // expected-note@-1 {{'gets' could write outside of 'buf'}}
+  // expected-note@-2 {{Assuming the condition is false}}
----------------
NoQ wrote:
> I still believe that this should be *the* warning in this code. This is 
> already broken code.
No, it is not broken, sadly. I would say it is broken, but this checker tries 
to be smarter than `grep`. Truncation is fine for [[ 
https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
 | STR31-C ]].

That change in my mindset made this checker possible when I have dropped the 
idea of warn on function calls.

================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:26-27
+
+  free(buf);
+  // expected-warning@-1 {{'buf' is not null-terminated}}
+  // expected-note@-2 {{'buf' is not null-terminated}}
----------------
NoQ wrote:
> It is not wrong to free a buffer that isn't null-terminated. We shouldn't 
> warn here.
That is when the string is read. We could have a list of function calls which 
are non-problematic. For now I am fine with that note, because you do not 
allocate something without reading it before freeing it. It is just an 
arbitrary test for now.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

Reply via email to