Author: Endre Fülöp
Date: 2021-11-28T23:36:47+01:00
New Revision: 4aac00a71db31121d70b140d7367e7f9d9992f66

URL: 
https://github.com/llvm/llvm-project/commit/4aac00a71db31121d70b140d7367e7f9d9992f66
DIFF: 
https://github.com/llvm/llvm-project/commit/4aac00a71db31121d70b140d7367e7f9d9992f66.diff

LOG: [analyzer][doc] Add user documenation for taint analysis

Checker alpha.security.taint.TaintPropagation now has user documentation for
taint analysis with an example showing external YAML configuration format.
The format of the taint configuration file is now documented under the user
documentation of Clang SA.

Differential Revision: https://reviews.llvm.org/D113251

Added: 
    clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst

Modified: 
    clang/docs/analyzer/checkers.rst
    clang/docs/analyzer/user-docs.rst

Removed: 
    


################################################################################
diff  --git a/clang/docs/analyzer/checkers.rst 
b/clang/docs/analyzer/checkers.rst
index df62fb0643f86..a31c38c133d97 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2317,8 +2317,15 @@ Checkers implementing `taint analysis 
<https://en.wikipedia.org/wiki/Taint_check
 
 alpha.security.taint.TaintPropagation (C, C++)
 """"""""""""""""""""""""""""""""""""""""""""""
-Generate taint information used by other checkers.
-A data is tainted when it comes from an unreliable source.
+
+Taint analysis identifies untrusted sources of information (taint sources), 
rules as to how the untrusted data flows along the execution path (propagation 
rules), and points of execution where the use of tainted data is risky (taints 
sinks).
+The most notable examples of taint sources are:
+
+  - network originating data
+  - environment variables
+  - database originating data
+
+``GenericTaintChecker`` is the main implementation checker for this rule, and 
it generates taint information used by other checkers.
 
 .. code-block:: c
 
@@ -2344,6 +2351,25 @@ A data is tainted when it comes from an unreliable 
source.
      // warn: untrusted data as buffer size
  }
 
+There are built-in sources, propagations and sinks defined in code inside 
``GenericTaintChecker``.
+These operations are handled even if no external taint configuration is 
provided.
+
+Default sources defined by ``GenericTaintChecker``:
+``fdopen``, ``fopen``, ``freopen``, ``getch``, ``getchar``, 
``getchar_unlocked``, ``gets``, ``scanf``, ``socket``, ``wgetch``
+
+Default propagations defined by ``GenericTaintChecker``:
+``atoi``, ``atol``, ``atoll``, ``fgetc``, ``fgetln``, ``fgets``, ``fscanf``, 
``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, 
``pread``, ``read``, ``strchr``, ``strrchr``, ``tolower``, ``toupper``
+
+Default sinks defined in ``GenericTaintChecker``:
+``printf``, ``setproctitle``, ``system``, ``popen``, ``execl``, ``execle``, 
``execlp``, ``execv``, ``execvp``, ``execvP``, ``execve``, ``dlopen``, 
``memcpy``, ``memmove``, ``strncpy``, ``strndup``, ``malloc``, ``calloc``, 
``alloca``, ``memccpy``, ``realloc``, ``bcopy``
+
+The user can configure taint sources, sinks, and propagation rules by 
providing a configuration file via checker option 
``alpha.security.taint.TaintPropagation:Config``.
+
+External taint configuration is in `YAML 
<http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format. The 
taint-related options defined in the config file extend but do not override the 
built-in sources, rules, sinks.
+The format of the external taint configuration file is not stable, and could 
change without any notice even in a non-backward compatible way.
+
+For a more detailed description of configuration options, please see the 
:doc:`user-docs/TaintAnalysisConfiguration`. For an example see 
:ref:`clangsa-taint-configuration-example`.
+
 alpha.unix
 ^^^^^^^^^^^
 

diff  --git a/clang/docs/analyzer/user-docs.rst 
b/clang/docs/analyzer/user-docs.rst
index 69486c52d2873..2292cec6944b1 100644
--- a/clang/docs/analyzer/user-docs.rst
+++ b/clang/docs/analyzer/user-docs.rst
@@ -7,3 +7,4 @@ Contents:
    :maxdepth: 2
 
    user-docs/CrossTranslationUnit
+   user-docs/TaintAnalysisConfiguration

diff  --git a/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst 
b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
new file mode 100644
index 0000000000000..94db84494e00b
--- /dev/null
+++ b/clang/docs/analyzer/user-docs/TaintAnalysisConfiguration.rst
@@ -0,0 +1,170 @@
+============================
+Taint Analysis Configuration
+============================
+
+The Clang Static Analyzer uses taint analysis to detect security-related 
issues in code.
+The backbone of taint analysis in the Clang SA is the `GenericTaintChecker`, 
which the user can access via the :ref:`alpha-security-taint-TaintPropagation` 
checker alias and this checker has a default taint-related configuration.
+The built-in default settings are defined in code, and they are always in 
effect once the checker is enabled, either directly or via the alias.
+The checker also provides a configuration interface for extending the default 
settings by providing a configuration file in `YAML 
<http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ format.
+This documentation describes the syntax of the configuration file and gives 
the informal semantics of the configuration options.
+
+.. contents::
+   :local:
+
+.. _clangsa-taint-configuration-overview:
+
+Overview
+________
+
+Taint analysis works by checking for the occurrence of special operations 
during the symbolic execution of the program.
+Taint analysis defines sources, sinks, and propagation rules. It identifies 
errors by detecting a flow of information that originates from a taint source, 
reaches a taint sink, and propagates through the program paths via propagation 
rules.
+A source, sink, or an operation that propagates taint is mainly 
domain-specific knowledge, but there are some built-in defaults provided by 
:ref:`alpha-security-taint-TaintPropagation`.
+It is possible to express that a statement sanitizes tainted values by 
providing a ``Filters`` section in the external configuration (see 
:ref:`clangsa-taint-configuration-example` and 
:ref:`clangsa-taint-filter-details`).
+There are no default filters defined in the built-in settings.
+The checker's documentation also specifies how to provide a custom taint 
configuration with command-line options.
+
+.. _clangsa-taint-configuration-example:
+
+Example configuration file
+__________________________
+
+.. code-block:: yaml
+
+  # The entries that specify arguments use 0-based indexing when specifying
+  # input arguments, and -1 is used to denote the return value.
+
+  Filters:
+    # Filter functions
+    # Taint is sanitized when tainted variables are pass arguments to filters.
+
+    # Filter function
+    #   void cleanse_first_arg(int* arg)
+    #
+    # Result example:
+    #   int x; // x is tainted
+    #   cleanse_first_arg(&x); // x is not tainted after the call
+    - Name: cleanse_first_arg
+      Args: [0]
+
+  Propagations:
+    # Source functions
+    # The omission of SrcArgs key indicates unconditional taint propagation,
+    # which is conceptually what a source does.
+
+    # Source function
+    #   size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream)
+    #
+    # Result example:
+    #   FILE* f = fopen("file.txt");
+    #   char buf[1024];
+    #   size_t read = fread(buf, sizeof(buf[0]), sizeof(buf)/sizeof(buf[0]), 
f);
+    #   // both read and buf are tainted
+    - Name: fread
+      DstArgs: [0, -1]
+
+    # Propagation functions
+    # The presence of SrcArgs key indicates conditional taint propagation,
+    # which is conceptually what a propagator does.
+
+    # Propagation function
+    #   char *dirname(char *path)
+    #
+    # Result example:
+    #   char* path = read_path();
+    #   char* dir = dirname(path);
+    #   // dir is tainted if path was tainted
+    - Name: dirname
+      SrcArgs: [0]
+      DstArgs: [-1]
+
+  Sinks:
+    # Sink functions
+    # If taint reaches any of the arguments specified, a warning is emitted.
+
+    # Sink function
+    #   int system(const char* command)
+    #
+    # Result example:
+    #   const char* command = read_command();
+    #   system(command); // emit diagnostic if command is tainted
+    - Name: system
+      Args: [0]
+
+In the example file above, the entries under the `Propagation` key implement 
the conceptual sources and propagations, and sinks have their dedicated `Sinks` 
key.
+The user can define operations (function calls) where the tainted values 
should be cleansed by listing entries under the `Filters` key.
+Filters model the sanitization of values done by the programmer, and providing 
these is key to avoiding false-positive findings.
+
+Configuration file syntax and semantics
+_______________________________________
+
+The configuration file should have valid `YAML 
<http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ syntax.
+
+The configuration file can have the following top-level keys:
+ - Filters
+ - Propagations
+ - Sinks
+
+Under the `Filters` key, the user can specify a list of operations that remove 
taint (see :ref:`clangsa-taint-filter-details` for details).
+
+Under the `Propagations` key, the user can specify a list of operations that 
introduce and propagate taint (see :ref:`clangsa-taint-propagation-details` for 
details).
+The user can mark taint sources with a `SrcArgs` key in the `Propagation` key, 
while propagations have none.
+The lack of the `SrcArgs` key means unconditional propagation, which is how 
sources are modeled.
+The semantics of propagations are such, that if any of the source arguments 
are tainted (specified by indexes in `SrcArgs`) then all of the destination 
arguments (specified by indexes in `DstArgs`) also become tainted.
+
+Under the `Sinks` key, the user can specify a list of operations where the 
checker should emit a bug report if tainted data reaches it (see 
:ref:`clangsa-taint-sink-details` for details).
+
+.. _clangsa-taint-filter-details:
+
+Filter syntax and semantics
+###########################
+
+An entry under `Filters` is a `YAML 
<http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the 
following mandatory keys:
+ - `Name` is a string that specifies the name of a function.
+   Encountering this function during symbolic execution the checker will 
sanitize taint from the memory region referred to by the given arguments or 
return a sanitized value.
+ - `Args` is a list of numbers in the range of ``[-1..int_max]``.
+   It indicates the indexes of arguments in the function call.
+   The number ``-1`` signifies the return value; other numbers identify call 
arguments.
+   The values of these arguments are considered clean after the function call.
+
+The following keys are optional:
+ - `Scope` is a string that specifies the prefix of the function's name in its 
fully qualified name. This option restricts the set of matching function calls. 
It can encode not only namespaces but struct/class names as well to match 
member functions.
+
+ .. _clangsa-taint-propagation-details:
+
+Propagation syntax and semantics
+################################
+
+An entry under `Propagation` is a `YAML 
<http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the 
following mandatory keys:
+ - `Name` is a string that specifies the name of a function.
+   Encountering this function during symbolic execution propagate taint from 
one or more arguments to other arguments and possibly the return value.
+   It helps model the taint-related behavior of functions that are not 
analyzable otherwise.
+
+The following keys are optional:
+ - `Scope` is a string that specifies the prefix of the function's name in its 
fully qualified name. This option restricts the set of matching function calls.
+ - `SrcArgs` is a list of numbers in the range of ``[0..int_max]`` that 
indicates the indexes of arguments in the function call.
+   Taint-propagation considers the values of these arguments during the 
evaluation of the function call.
+   If any `SrcArgs` arguments are tainted, the checker will consider all 
`DstArgs` arguments tainted after the call.
+ - `DstArgs` is a list of numbers in the range of ``[-1..int_max]`` that 
indicates the indexes of arguments in the function call.
+   The number ``-1`` specifies the return value of the function.
+   If any `SrcArgs` arguments are tainted, the checker will consider all 
`DstArgs` arguments tainted after the call.
+ - `VariadicType` is a string that can be one of ``None``, ``Dst``, ``Src``.
+   It is used in conjunction with `VariadicIndex` to specify arguments inside 
a variadic argument.
+   The value of ``Src`` will treat every call site argument that is part of a 
variadic argument list as a source concerning propagation rules (as if 
specified by `SrcArg`).
+   The value of ``Dst`` will treat every call site argument that is part of a 
variadic argument list a destination concerning propagation rules.
+   The value of ``None`` will not consider the arguments that are part of a 
variadic argument list (this option is redundant but can be used to temporarily 
switch off handling of a particular variadic argument option without removing 
the VariadicIndex key).
+ - `VariadicIndex` is a number in the range of ``[0..int_max]``. It indicates 
the starting index of the variadic argument in the signature of the function.
+
+
+.. _clangsa-taint-sink-details:
+
+Sink syntax and semantics
+#########################
+
+An entry under `Sinks` is a `YAML 
<http://llvm.org/docs/YamlIO.html#introduction-to-yaml>`_ object with the 
following mandatory keys:
+ - `Name` is a string that specifies the name of a function.
+   Encountering this function during symbolic execution will emit a 
taint-related diagnostic if any of the arguments specified with `Args` are 
tainted at the call site.
+ - `Args` is a list of numbers in the range of ``[0..int_max]`` that indicates 
the indexes of arguments in the function call.
+   The checker reports an error if any of the specified arguments are tainted.
+
+The following keys are optional:
+ - `Scope` is a string that specifies the prefix of the function's name in its 
fully qualified name. This option restricts the set of matching function calls.


        
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to