https://github.com/dkrupp created 
https://github.com/llvm/llvm-project/pull/66086:

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common pattern to 
copy user provided input to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when the attacker can 
directly specify the size of the allocated area as an arbitrary large number 
(e.g. the value is converted from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function with the 
consideration not to emit warnings when the tainted value cannot be 
"arbitrarily large" (such as the size of an already allocated memory block).

The change has been evaluated on the following open source projects:

- memcached: [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)

- tmux: 0 lost reports
- twin [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- vim [1 lost false 
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- openssl 0 lost reports
- sqliste [2 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline&newcheck=sqlite_version-3.33.0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- ffmpeg 0 lost repots
- postgresql [3 lost false 
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- tinyxml 0 lost reports
- libwebm 0 lost reports
- xerces 0 lost reports

In all cases the lost reports are originating from copying untrusted 
environment variables into another buffer.

There are 2 types of lost false positive reports:
1)  [Where the warning is emitted at the malloc call by the TaintPropagation 
Checker 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648506&report-hash=2079221954026f17e1ecb614f5f054db&report-filepath=%2amemcached.c)
`
            len = strlen(portnumber_filename)+4+1;
            temp_portnumber_filename = malloc(len);
`

2) When pointers are set based on the length of the tainted string by the 
ArrayOutofBoundsv2 checker.
For example [this 
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c)case.



>From 9c7674c39e1b07536f8c57bcdd2b07fb04b4873c Mon Sep 17 00:00:00 2001
From: Daniel Krupp <daniel.kr...@ericsson.com>
Date: Fri, 8 Sep 2023 16:57:49 +0200
Subject: [PATCH] [analyzer] TaintPropagation checker strlen() should not
 propagate

strlen(..) call should not propagate taintedness,
because it brings in many false positive findings.
It is a common pattern to copy user provided input
to another buffer. In these cases we always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when
the attacker can directly specify the size of the allocated area
as an arbitrary large number (e.g. the value is converted
from a user provided string).

Later, we could reintroduce strlen() as a taint propagating function
with the consideration not to emit warnings when the tainted value
cannot be "arbitrarily large" (such as the size of an already allocated
memory block).
---
 clang/docs/analyzer/checkers.rst               |  4 ++--
 .../Checkers/GenericTaintChecker.cpp           |  2 --
 clang/test/Analysis/taint-diagnostic-visitor.c | 13 +++++++------
 clang/test/Analysis/taint-generic.c            | 18 ------------------
 4 files changed, 9 insertions(+), 28 deletions(-)

diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 54ea49e7426cc86..dbd6d7787823530 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2599,8 +2599,8 @@ Default propagations rules:
  ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``,
  ``qsort_r``, ``rawmemchr``, ``read``, ``recv``, ``recvfrom``, ``rindex``,
  ``strcasestr``, ``strchr``, ``strchrnul``, ``strcasecmp``, ``strcmp``,
- ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``,
- ``strndupa``, ``strnlen``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
+ ``strcspn``, ``strncasecmp``, ``strncmp``, ``strndup``,
+ ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``,
  ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``,
  ``toupper``, ``ttyname``, ``ttyname_r``, ``wctomb``, ``wcwidth``
 
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 3dcb45c0b110383..9df69c1ad1b525e 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -694,8 +694,6 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) 
const {
       {{{"strpbrk"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"strndup"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"strndupa"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-      {{{"strlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
-      {{{"strnlen"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"strtol"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
       {{{"strtoll"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
       {{{"strtoul"}}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c 
b/clang/test/Analysis/taint-diagnostic-visitor.c
index 663836836d3db67..1eb926f25f9a778 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -10,6 +10,7 @@ int scanf(const char *restrict format, ...);
 int system(const char *command);
 char* getenv( const char* env_var );
 size_t strlen( const char* str );
+int atoi( const char* str );
 void *malloc(size_t size );
 void free( void *ptr );
 char *fgets(char *str, int n, FILE *stream);
@@ -54,11 +55,11 @@ void taintDiagnosticVLA(void) {
 // propagating through variables and expressions
 char *taintDiagnosticPropagation(){
   char *pathbuf;
-  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
                                  // expected-note@-1 {{Taint propagated to the 
return value}}
-  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+  if (size){ // expected-note {{Assuming 'size' is non-null}}
                       // expected-note@-1 {{Taking true branch}}
-    pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted 
data is used to specify the buffer size}}
+    pathbuf=(char*) malloc(atoi(size)); // expected-warning{{Untrusted data is 
used to specify the buffer size}}
                                                 // expected-note@-1{{Untrusted 
data is used to specify the buffer size}}
                                                 // expected-note@-2 {{Taint 
propagated to the return value}}
     return pathbuf;
@@ -71,12 +72,12 @@ char *taintDiagnosticPropagation(){
 char *taintDiagnosticPropagation2(){
   char *pathbuf;
   char *user_env2=getenv("USER_ENV_VAR2");//unrelated taint source
-  char *pathlist=getenv("PATH"); // expected-note {{Taint originated here}}
+  char *size=getenv("SIZE"); // expected-note {{Taint originated here}}
                                  // expected-note@-1 {{Taint propagated to the 
return value}}
   char *user_env=getenv("USER_ENV_VAR");//unrelated taint source
-  if (pathlist){ // expected-note {{Assuming 'pathlist' is non-null}}
+  if (size){ // expected-note {{Assuming 'size' is non-null}}
                       // expected-note@-1 {{Taking true branch}}
-    pathbuf=(char*) malloc(strlen(pathlist)+1); // expected-warning{{Untrusted 
data is used to specify the buffer size}}
+    pathbuf=(char*) malloc(atoi(size)+1); // expected-warning{{Untrusted data 
is used to specify the buffer size}}
                                                 // expected-note@-1{{Untrusted 
data is used to specify the buffer size}}
                                                 // expected-note@-2 {{Taint 
propagated to the return value}}
     return pathbuf;
diff --git a/clang/test/Analysis/taint-generic.c 
b/clang/test/Analysis/taint-generic.c
index b7906d201e4fad3..a614453c63af671 100644
--- a/clang/test/Analysis/taint-generic.c
+++ b/clang/test/Analysis/taint-generic.c
@@ -915,24 +915,6 @@ void testStrndupa(size_t n) {
   clang_analyzer_isTainted_charp(result); // expected-warning {{YES}}
 }
 
-size_t strlen(const char *s);
-void testStrlen() {
-  char s[10];
-  scanf("%9s", s);
-
-  size_t result = strlen(s);
-  clang_analyzer_isTainted_int(result); // expected-warning {{YES}}
-}
-
-size_t strnlen(const char *s, size_t maxlen);
-void testStrnlen(size_t maxlen) {
-  char s[10];
-  scanf("%9s", s);
-
-  size_t result = strnlen(s, maxlen);
-  clang_analyzer_isTainted_int(result); // expected-warning {{YES}}
-}
-
 long strtol(const char *restrict nptr, char **restrict endptr, int base);
 long long strtoll(const char *restrict nptr, char **restrict endptr, int base);
 unsigned long int strtoul(const char *nptr, char **endptr, int base);

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to