Author: Balázs Benics
Date: 2026-01-14T17:50:51+01:00
New Revision: c5e95af71d460ee41bb7912ceceaa88dd9a6a67c

URL: 
https://github.com/llvm/llvm-project/commit/c5e95af71d460ee41bb7912ceceaa88dd9a6a67c
DIFF: 
https://github.com/llvm/llvm-project/commit/c5e95af71d460ee41bb7912ceceaa88dd9a6a67c.diff

LOG: [clang][ssaf][docs] Document the Summary Extraction pipeline (#172876)

This patch adds some documentation about the design of the Scalable
Static Analysis Framework (SSAF) Summary Extraction part.

This mainly focuses on how the custom FrontendAction would load
different analyses (their extraction part), and the different formats it
should export into.
Each FrontendAction call would process a single TU by extracting
summaries from them and serializing the results into a file in the
desired format.

The details are not polished yet, but I think it's still beneficial to
have some guidance on how the upcoming components would fit together,
hence this document.
I'll come back to this document to keep it up-to-date as we proceed with
the upstreaming.

Added: 
    clang/docs/ScalableStaticAnalysisFramework/Framework.rst
    clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst

Modified: 
    clang/docs/index.rst

Removed: 
    


################################################################################
diff  --git a/clang/docs/ScalableStaticAnalysisFramework/Framework.rst 
b/clang/docs/ScalableStaticAnalysisFramework/Framework.rst
new file mode 100644
index 0000000000000..83983995b38f7
--- /dev/null
+++ b/clang/docs/ScalableStaticAnalysisFramework/Framework.rst
@@ -0,0 +1,13 @@
+==================================
+Scalable Static Analysis Framework
+==================================
+
+This is a framework for writing cross-translation unit analyses in a scalable 
and extensible setting.
+
+.. toctree::
+   :caption: Table of Contents
+   :numbered:
+   :maxdepth: 1
+   :glob:
+
+   *
\ No newline at end of file

diff  --git a/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst 
b/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst
new file mode 100644
index 0000000000000..6b9c7db5bc048
--- /dev/null
+++ b/clang/docs/ScalableStaticAnalysisFramework/SummaryExtraction.rst
@@ -0,0 +1,81 @@
+==================
+Summary Extraction
+==================
+
+.. WARNING:: The framework is rapidly evolving.
+  The documentation might be out-of-sync of the implementation.
+  The purpose of this documentation to give context for upcoming reviews.
+
+
+The simplest way to think about the lifetime of a summary extraction is by 
following the handlers of the ``FrontendAction`` implementing it.
+There are 3 APIs that are important for us, that are invoked in this order:
+
+  - ``BeginInvocation()``: Checks the command-line arguments related to 
summary extraction.
+  - ``CreateASTConsumer()``: Creates the ASTConsumers for the 
diff erent summary extractors.
+  - ``EndSourceFile()``: Serializes and writes the extracted summaries.
+
+Implementation details
+**********************
+
+Global Registries
+=================
+
+The framework uses `llvm::Registry\<\> 
<https://llvm.org/doxygen/classllvm_1_1Registry.html>`_
+as an extension point for adding new summary analyses or serialization formats.
+Each entry in the *registry* holds a name, a description and a pointer to a 
constructor.
+
+**Pros**:
+
+  - Decentralizes the registration. There is not a single place in the source 
code where we spell out all of the analyses/formats.
+  - Plays nicely with downstream extensibility, as downstream users can add 
their own analyses/formats without touching the source code of the framework; 
while still benefiting from the upstream-provided analyses/formats.
+  - Works with static and dynamic linking. In other words, plugins as shared 
objects compose naturally.
+
+**Cons**:
+
+  - Registration slows down all ``clang`` users by a tiny amount, even if they 
don't invoke the summary extraction framework.
+  - As the registration is now decoupled, it's now a global program property; 
and potentially more 
diff icult to reason about.
+  - Complicates testing.
+
+Example for adding a custom summary extraction
+----------------------------------------------
+
+.. code-block:: c++
+
+  //--- MyAnalysis.cpp
+  class MyAnalysis : public TUSummaryExtractor {
+    using TUSummaryExtractor::TUSummaryExtractor;
+    // Implementation...
+  };
+
+  static TUSummaryExtractorRegistry::Add<MyAnalysis>
+    RegisterExtractor("MyAwesomeAnalysis", "The analysis produces some awesome 
results");
+
+Details of ``BeginInvocation()``
+================================
+
+#. Processes the 
diff erent fields populated from the command line. Ensure that mandatory flags 
are set, etc.
+#. For each requested analysis, check if we have a matching 
``TUSummaryExtractorInfo`` in the static registry, and diagnose if not.
+#. Parse the format name, and check if we have a matching ``FormatInfo`` in 
the format registry.
+#. Lastly, forward the ``BeginInvocation`` call to the wrapped FrontendAction.
+
+
+Details of ``CreateASTConsumer()``
+==================================
+
+#. Create the wrapped ``FrontendAction`` consumers by calling 
``CreateASTConsumer()`` on it.
+#. Call ``ssaf::makeTUSummaryExtractor()`` on each requested analysis name.
+
+  #. Look up in the *summary registry* the relevant *Info* object and call the 
``Factory`` function pointer to create the relevant ``ASTConsumer``.
+  #. Remember, we pass a mutable ``TUSummaryBuilder`` reference to the 
constructor, so the analysis can create ``EntityID`` objects and map them to 
``TUSummaryData`` objects in their implementation. Their custom metadata needs 
to inherit from ``TUSummaryData`` to achieve this.
+
+#. Lastly, add all of these ``ASTConsumers`` to the ``MultiplexConsumer`` and 
return that.
+
+
+Details of ``EndSourceFile()``
+==============================
+
+#. Call the virtual ``writeTUSummary()`` on the serialization format, leading 
to the desired format handler (such as JSON or binary or something custom - 
provided by a plugin).
+
+  #. Create the directory structure for the enabled analyses.
+  #. Serialize ``entities``, ``entity_linkage``, etc. Achieve by calling the 
matching virtual functions, dispatching to the concrete implementation.
+  #. The same goes for each enabled analysis, serialize the ``EntityID`` to 
``TUSummaryData`` mapping using the analysis-provided ``Serialize`` function 
pointer.

diff  --git a/clang/docs/index.rst b/clang/docs/index.rst
index 70c8737a2fe0d..a0d0401ed1c86 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -27,6 +27,7 @@ Using Clang as a Compiler
    ClangStaticAnalyzer
    ThreadSafetyAnalysis
    SafeBuffers
+   ScalableStaticAnalysisFramework/Framework
    DataFlowAnalysisIntro
    FunctionEffectAnalysis
    AddressSanitizer


        
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to