NoQ added a comment.

Ok, so, what should the checker warn on? What should be the intended state 
machine for this checker?

We basically have two possible categories of answers to this question:

- "The checker warns every time it finds an execution path on which unintended 
behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer 
access") and define the checker in such a way that all warnings describe actual 
bugs of this kind (in our example, out-of-bound buffer accesses) as long as the 
rest of the static analyzer works correctly.
- Or, you could say "The checker enforces a certain coding guideline on the 
user" - and then you should describe the guideline in a very short and 
easy-to-understand manner (say, "don't pass the string by pointer anywhere 
before you null-terminate it"), and then make checker check exactly this 
guideline. You may introduce some suppressions for intentional violations of 
the guideline, but the guideline itself should be simple, it should be always 
possible to follow it, and it should sound nice enough for developers to feel 
good about themselves when they follow it.

> - The problem is not the size, but the missing `'\0'` (which you can have 
> multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1773703 <https://reviews.llvm.org/D70411#1773703>, @Charusso wrote:

> In D70411#1769786 <https://reviews.llvm.org/D70411#1769786>, @NoQ wrote:
>
> > So i suggest that we make one more step back and agree upon what exactly 
> > are we checking here. I.e., basically, agree on the tests. Because right 
> > now it looks like you're trying to blindly generalize over the examples: 
> > //"The CERT example has a `strcpy()` into a fixed-size buffer, let's warn 
> > on all `strcpy()`s into fixed-size buffers!"//.
>
>
> Just for clarification, I made the entire `MemoryBlockRegion` stuff because I 
> do not care how do you allocate a memory block (even a `string` is counted as 
> that).


Yup, i mean, you can go as far as you want with generalizing over "fixed size 
buffer" in this approach, but what i'm saying is that you're still not 
addressing the whole train of thought about the length of the source string.

In D70411#1773703 <https://reviews.llvm.org/D70411#1773703>, @Charusso wrote:

> > F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html 
> > <https://reviews.llvm.org/F10967277>
> >  The source string is an IP address and port, which has a known limit on 
> > the number of digits it can have.
>
> The known size does not mean that the string is going to be null-terminated.


It does, because `strcpy` is guaranteed to null-terminate as long as it has 
enough storage capacity.

>> For example, in the example titled "Noncompliant Code Example (argv)", it is 
>> explicitly stated that the problem is not only that the buffer is fixed-size 
>> and `strcpy()` is made, but also that the original string //is controlled by 
>> the attacker//. The right analysis to catch such issues is //taint 
>> analysis//. It's a typical taint problem. I honestly believe you should try 
>> to solve it by combining the taint checker with the string checker: "if 
>> string length metadata is tainted (in particular, if the string itself is 
>> tainted) and is potentially larger than the destination buffer size (eg., 
>> unconstrained), warn". With the recent advancements in the development of 
>> the taint checker, i think you can get pretty far this way without 
>> constantly struggling with false positives.
> 
> This checker indeed would require an additional taint analysis, but as we do 
> not have a non-alpha variant, I am fine to call every non-null-terminated 
> string reading an issue (as it is), whatever is the source. That is a 
> generalization, yes, in the best possible manner what we have got today. Also 
> I am not a big fan of relying on non-alpha stuff and I am sure I have defined 
> my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad 
shape, but the taint analysis itself is pretty good, and checks that rely on 
taint but aren't part of the generic taint checker are also pretty good. I 
actually believe taint analysis could be enabled by default as soon as somebody 
goes through the broken checks and disables/removes them. If you rely on the 
existing taint analysis infrastructure and make a good check, that'll be 
wonderful and would further encourage us to move taint analysis out of alpha.

> - We heavily need to swap the value-tracking with `NoteTags` to make this 
> useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a 
warning as soon as we've reached far enough on the execution path to conclude 
that there's a problem.



================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:798
+
+def Str31cChecker : Checker<"31.c">,
+  HelpText<"SEI CERT checker of rules defined in STR31-C">,
----------------
Maybe `31c`? I'm afraid a dot in package name would be confusing and people 
will actually try something like to enable/disable `cert.str.31`.


================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:803
+
+} // end "cert.str"
+
----------------
`alpha.cert.str`.


================
Comment at: clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp:23
+const char *const CXXObjectLifecycle = "C++ object lifecycle";
+const char *const SecurityError = "SecurityError";
+
----------------
These strings are visible to humans, please write some proper English :)


================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning@-1 {{'dest' is not null-terminated}}
----------------
Charusso wrote:
> NoQ wrote:
> > Looks like a false positive. Consider the following perfectly correct (even 
> > if a bit inefficient) code that only differs from the current code only in 
> > names and context:
> > ```lang=c++
> > void copy_and_test_if_short(const char *src) {
> >   char dest[128];
> >   strcpy(dest, src);
> >   warn_if_starts_with_foo(dest);
> > }
> > 
> > void copy_and_test(const char *src) {
> >   if (strlen(src) < 64)
> >     copy_and_test_if_short(src);
> >   else
> >     copy_and_test_if_long(src);
> > }
> > ```
> That is why I have asked whether we have a way to say "when the string is 
> read". I like that heuristic for now, because any kind of not null-terminated 
> string is the root of the evil for [[ 
> https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
>  | STR31-C ]].
> 
> In this case the attacker is the programmer who sends a possibly not 
> null-terminated string around and the function which receives an arbitrary 
> input would carry the issue.
> 
> Thanks for the example, I wanted to cover ranges, but I have forgotten it, 
> because in the wild peoples do not check for bounds. Copy-pasted.
Note that function `copy_and_test` in my example would not be necessarily 
visible to you. It may be in another translation unit.

So in order to "cover ranges" you'll have to discard all cases where the length 
is completely unknown.


================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:22
+  if (gets(buf)) {}
+  // expected-note@-1 {{'gets' could write outside of 'buf'}}
+  // expected-note@-2 {{Assuming the condition is false}}
----------------
Charusso wrote:
> NoQ wrote:
> > I still believe that this should be *the* warning in this code. This is 
> > already broken code.
> No, it is not broken, sadly. I would say it is broken, but this checker tries 
> to be smarter than `grep`. Truncation is fine for [[ 
> https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator
>  | STR31-C ]].
> 
> That change in my mindset made this checker possible when I have dropped the 
> idea of warn on function calls.
> `Truncation is fine for STR31-C.`
This isn't just truncation, this is buffer overflow, and it's definitely not 
fine if we want to "Guarantee that storage for strings has sufficient space for 
character data".

This is absolutely the right place to simply warn on every invocation of 
`gets()`.


================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:26-27
+
+  free(buf);
+  // expected-warning@-1 {{'buf' is not null-terminated}}
+  // expected-note@-2 {{'buf' is not null-terminated}}
----------------
Charusso wrote:
> NoQ wrote:
> > It is not wrong to free a buffer that isn't null-terminated. We shouldn't 
> > warn here.
> That is when the string is read. We could have a list of function calls which 
> are non-problematic. For now I am fine with that note, because you do not 
> allocate something without reading it before freeing it. It is just an 
> arbitrary test for now.
> That is when the string is read.

That strikes me as an example of [[ 
https://en.wikipedia.org/wiki/Law_of_the_instrument | the Maslow's hammer ]]. 
The static analyzer isn't great at guessing whether the function reads from the 
string or not. For a dynamic analysis it wouldn't have been a problem.

That said, maybe you could get away with discriminating between `char *` and 
`const char *`. If the string is passed in by const pointer, there's nothing 
you can do except read from it. If it's passed in by non-const pointer, you can 
conservatively assume that it isn't read.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to