- Review feedback.

CHANGES SINCE LAST UPDATE
  http://reviews.llvm.org/D3402?vs=8757&id=8758

BRANCH
  sample-profile-doc

http://reviews.llvm.org/D3402

Files:
  docs/UsersManual.rst

REPLY HANDLER ACTIONS
  Reply to comment, or !reject, !abandon, !reclaim, !resign, !rethink, 
!unsubscribe.
Index: docs/UsersManual.rst
===================================================================
--- docs/UsersManual.rst
+++ docs/UsersManual.rst
@@ -1065,6 +1065,166 @@
    only. This only applies to the AArch64 architecture.
 
 
+Using Sampling Profilers for Optimization
+-----------------------------------------
+
+Sampling profilers are used to collect runtime information, such as
+hardware counters, while your application executes. They are typically
+very efficient and do not incur a large runtime overhead. The
+sample data collected by the profiler can be used during compilation
+to determine what the most executed areas of the code are.
+
+In particular, sample profilers can provide execution counts for all
+instructions in the code and information on branches taken and function
+invocation. The compiler can use this information in its optimization
+cost models. For example, knowing that a branch is taken very
+frequently helps the compiler make better decisions when ordering
+basic blocks. Knowing that a function ``foo`` is called more
+frequently than another function ``bar`` helps the inliner.
+
+Using the data from a sample profiler requires some changes in the way
+a program is built. Before the compiler can use profiling information,
+the code needs to execute under the profiler. The following is the
+usual build cycle when using sample profilers for optimization:
+
+1. Build the code with source line table information. You can use all the
+   usual build flags that you always build your application with. The only
+   requirement is that you add ``-gline-tables-only`` or ``-g`` to the
+   command line. This is important for the profiler to be able to map
+   instructions back to source line locations.
+
+   .. code-block:: console
+
+     $ clang++ -O2 -gline-tables-only code.cc -o code
+
+2. Run the executable under a sampling profiler. The specific profiler
+   you use does not really matter, as long as its output can be converted
+   into the format that the LLVM optimizer understands. Currently, there
+   exists a conversion tool for the Linux Perf profiler
+   (https://perf.wiki.kernel.org/), so these examples assume that you
+   are using Linux Perf to profile your code.
+
+   .. code-block:: console
+
+     $ perf record -b ./code
+
+   Note the use of the ``-b`` flag. This tells Perf to use the Last Branch
+   Record (LBR) to record call chains. While this is not strictly required,
+   it provides better call information, which improves the accuracy of
+   the profile data.
+
+3. Convert the collected profile data to LLVM's sample profile format.
+   This is currently supported via the AutoFDO converter ``create_llvm_prof``.
+   It is available at http://github.com/google/autofdo. Once built and
+   installed, you can convert the ``perf.data`` file to LLVM using
+   the command:
+
+   .. code-block:: console
+
+     $ create_llvm_prof --binary=./code --out=code.prof
+
+   This will read ``perf.data`` and the binary file ``./code`` and emit
+   the profile data in ``code.prof``. Note that if you ran ``perf``
+   without the ``-b`` flag, you need to use ``--use_lbr=false`` when
+   calling ``create_llvm_prof``.
+
+4. Build the code again using the collected profile. This step feeds
+   the profile back to the optimizers. This should result in a binary
+   that executes faster than the original one. Note that you are not
+   required to build the code with the exact same arguments that you
+   used in the first step. The only requirement is that you build the code
+   with ``-gline-tables-only`` and ``-fprofile-sample-use``.
+
+   .. code-block:: console
+
+     $ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
+
+
+Sample Profile Format
+^^^^^^^^^^^^^^^^^^^^^
+
+If you are not using Linux Perf to collect profiles, you will need to
+write a conversion tool from your profiler to LLVM's format. This section
+explains the file format expected by the backend.
+
+Sample profiles are written as ASCII text. The file is divided into sections,
+which correspond to each of the functions executed at runtime. Each
+section has the following format (taken from
+https://github.com/google/autofdo/blob/master/profile_writer.h):
+
+.. code-block:: console
+
+    function1:total_samples:total_head_samples
+    offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ]
+    offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ]
+    ...
+    offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ]
+
+The file may contain blank lines between sections and within a
+section. However, the spacing within a single line is fixed. Additional
+spaces will result in an error while reading the file.
+
+Function names must be mangled in order for the profile loader to
+match them in the current translation unit. The two numbers in the
+function header specify how many total samples were accumulated in the
+function (first number), and the total number of samples accumulated
+in the prologue of the function (second number). This head sample
+count provides an indicator of how frequently the function is invoked.
+
+Each sampled line may contain several items. Some are optional (marked
+below):
+
+a. Source line offset. This number represents the line number
+   in the function where the sample was collected. The line number is
+   always relative to the line where symbol of the function is
+   defined. So, if the function has its header at line 280, the offset
+   13 is at line 293 in the file.
+
+   Note that this offset should never be a negative number. This could
+   happen in cases like macros. The debug machinery will register the
+   line number at the point of macro expansion. So, if the macro was
+   expanded in a line before the start of the function, the profile
+   converter should emit a 0 as the offset (this means that the optimizers
+   will not be able to associate a meaningful weight to the instructions
+   in the macro).
+
+b. [OPTIONAL] Discriminator. This is used if the sampled program
+   was compiled with DWARF discriminator support
+   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).
+   DWARF discriminators are unsigned integer values that allow the
+   compiler to distinguish between multiple execution paths on the
+   same source line location.
+
+   For example, consider the line of code ``if (cond) foo(); else bar();``.
+   If the predicate ``cond`` is true 80% of the time, then the edge
+   into function ``foo`` should be considered to be taken most of the
+   time. But both calls to ``foo`` and ``bar`` are at the same source
+   line, so a sample count at that line is not sufficient. The
+   compiler needs to know which part of that line is taken more
+   frequently.
+
+   This is what discriminators provide. In this case, the calls to
+   ``foo`` and ``bar`` will be at the same line, but will have
+   different discriminator values. This allows the compiler to correctly
+   set edge weights into ``foo`` and ``bar``.
+
+c. Number of samples. This is an integer quantity representing the
+   number of samples collected by the profiler at this source
+   location.
+
+d. [OPTIONAL] Potential call targets and samples. If present, this
+   line contains a call instruction. This models both direct and
+   number of samples. For example,
+
+   .. code-block:: console
+
+     130: 7  foo:3  bar:2  baz:7
+
+   The above means that at relative line offset 130 there is a call
+   instruction that calls one of ``foo()``, ``bar()`` and ``baz()``,
+   with ``baz()`` being the relatively more frequently called target.
+
+
 Controlling Size of Debug Information
 -------------------------------------
 
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to