================
@@ -1,119 +1,542 @@
-=============================================
-SYCL Compiler and Runtime architecture design
-=============================================
+============
+SYCL Support
+============
.. contents::
:local:
Introduction
============
+The `SYCL 2020 specification <SYCL-2020_>`_ defines a single-source programming
+model and C++ run-time library interface for writing portable programs that
+support heterogeneous devices including GPUs, CPUs, and accelerators.
+The specification is intended to allow for a wide range of implementation
+possibilities, examples of which include:
+
+- A SYCL run-time library written in standard C++ that executes kernels on a
+ homogeneous set of host and device processors, each of which can execute
+ common compiled code from shared memory.
+- A SYCL run-time library that executes kernels on a heterogeneous set of
+ device processors for which each kernel is pre-compiled for each supported
+ device processor (Ahead-Of-Time (AOT) compilation) or for a family of device
+ processors (Just-In-Time (JIT) compilation).
+
+Since Clang is a conforming implementation of the C++ standard, no additional
+features are required for support of the first implementation strategy.
+This document details the core language features Clang provides for use by
+SYCL run-time libraries that use the second implementation strategy.
+
+.. _SYCL-2020:
+ https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html
+
+
+Example Usage
+=============
+SYCL is designed as an extension of C++ rather than as a distinct programming
+language.
+SYCL support is enabled with the `-fsycl <opt-fsycl_>`_ option.
+
+.. code-block:: sh
+
+ clang++ -c -fsycl source-file.cpp
+
+The choice of which target devices will be supported is made at compile time.
+By default, SYCL source files will be compiled with support for a host target
+dependent set of target devices.
+For example, when compiling for a ``x86_64-unknown-linux-gnu`` host target,
+target support will be enabled for ``spirv64-unknown-unknown`` devices.
+The set of supported target devices can be specified via a comma separated list
+of target triples with the `--offload-targets= <opt-offload-targets_>`_ option.
+The following Clang invocation enables support for AMD, NVIDIA, and Intel GPU
+targets.
+
+.. code-block:: sh
+
+ clang++ -c -fsycl \
+
--offload-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda,spirv64-unknown-unknown
\
+ source-file.cpp
+
+Object files built with the `-fsycl <opt-fsycl_>`_ option contain device
+images that require additional processing at link time.
+Programs linked with such object files must also be linked using the
+``clang++`` driver and the `-fsycl <opt-fsycl_>`_ option.
+
+.. code-block:: sh
+
+ clang++ -fsycl example.o source-file.o -o example
+
+.. _opt-fsycl:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fsycl
+.. _opt-offload-targets:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-offload-targets
+
+
+Compilation Model
+=================
+`SYCL 2020 section 5.1, "Offline compilation of SYCL source files"
<SYCL-2020-5.1_>`_
+acknowledges two compilation models.
+
+- Single-source Multiple Compiler Pass (`SMCP`_) describes a compilation model
+ in which source code is separately parsed and analyzed for the host target
+ and each device target.
+
+- Single-source Single Compiler Pass (`SSCP`_) describes a compilation model
+ in which source code is parsed and analyzed once with code generation
+ performed separately for the host target and each device target.
+
+Clang only supports the `SMCP`_ compilation model currently, but the SYCL
+language support features have been designed to allow for support of the
+`SSCP`_ compilation model to be added in the future.
+
+By default, SYCL source files are compiled for the host target and for each
+device target.
+In some cases, it is useful to restrict compilation to just the host target or
+just the device targets; the `-fsycl-host-only <opt-fsycl-host-only_>`_ and
+`-fsycl-device-only <opt-fsycl-device-only_>`_ options are available for these
+purposes.
+
+.. _SMCP:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:smcp
+.. _SSCP:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:sscp
+.. _SYCL-2020-5.1:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_offline_compilation_of_sycl_source_files
+.. _opt-fsycl-host-only:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-offload-host-only
+.. _opt-fsycl-device-only:
+
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-offload-device-only
+
+
+Supported Targets
+=================
+Support for SYCL is still in the implementation phase, but all targets
+supported by the `--offload-targets= <opt-offload-targets_>`_ option
+are intended to eventually be supported.
-This document describes the architecture of the SYCL compiler and runtime
-library. More details are provided in
-`external document
<https://github.com/intel/llvm/blob/sycl/sycl/doc/design/CompilerAndRuntimeDesign.md>`_\
,
-which are going to be added to clang documentation in the future.
-
-Address space handling
-======================
-
-The SYCL specification represents pointers to disjoint memory regions using C++
-wrapper classes on an accelerator to enable compilation with a standard C++
-toolchain and a SYCL compiler toolchain. Section 3.8.2 of SYCL 2020
-specification defines
-`memory model
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_sycl_device_memory_model>`_\
,
-section 4.7.7 - `address space classes
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_address_space_classes>`_
-and section 5.9 covers `address space deduction
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_address_space_deduction>`_.
-The SYCL specification allows two modes of address space deduction: "generic as
-default address space" (see section 5.9.3) and "inferred address space" (see
-section 5.9.4). Current implementation supports only "generic as default
address
-space" mode.
-
-SYCL borrows its memory model from OpenCL however SYCL doesn't perform
-the address space qualifier inference as detailed in
-`OpenCL C v3.0 6.7.8
<https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-inference>`_.
-
-The default address space is "generic-memory", which is a virtual address space
-that overlaps the global, local, and private address spaces. SYCL mode enables
-following conversions:
-
-- explicit conversions to/from the default address space from/to the address
- space-attributed type
-- implicit conversions from the address space-attributed type to the default
- address space
-- explicit conversions to/from the global address space from/to the
- ``__attribute__((opencl_global_device))`` or
- ``__attribute__((opencl_global_host))`` address space-attributed type
-- implicit conversions from the ``__attribute__((opencl_global_device))`` or
- ``__attribute__((opencl_global_host))`` address space-attributed type to the
- global address space
-
-All named address spaces are disjoint and sub-sets of default address space.
-
-The SPIR target allocates SYCL namespace scope variables in the global address
-space.
-
-Pointers to default address space should get lowered into a pointer to a
generic
-address space (or flat to reuse more general terminology). But depending on the
-allocation context, the default address space of a non-pointer type is assigned
-to a specific address space. This is described in
-`common address space deduction rules
<https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:commonAddressSpace>`_
-section.
-
-This is also in line with the behaviour of CUDA (`small example
-<https://godbolt.org/z/veqTfo9PK>`_).
-
-``multi_ptr`` class implementation example:
+
+Predefined Macros
+=================
+`SYCL 2020 section 5.6, "Preprocessor directives and macros" <SYCL-2020-5.6_>`_
+specifies macros that a SYCL implementation is required to provide.
+Most such macros are defined by the SYCL run-time library and require inclusion
+of the ``<sycl/sycl.hpp>`` header file.
+The following macros are conditionally predefined by the compiler.
+
+.. list-table::
+ :header-rows: 1
+
+ * - Macro
+ - Description
+ * - ``__SYCL_DEVICE_ONLY__``
+ - Predefined by a `SMCP`_ implementation during device compilation (but
not
+ during host compilation).
+ * - ``__SYCL_SINGLE_SOURCE__``
+ - Predefined by a `SSCP`_ implementation during (host and device)
+ compilation.
+
+Since Clang only supports the `SMCP`_ compilation model currently, the
+``__SYCL_SINGLE_SOURCE__`` macro is never predefined.
+
+.. _SYCL-2020-5.6:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_preprocessor_directives_and_macros
+
+
+Language Support
+================
+`SYCL 2020 section 3.12.3, "Library-only implementation" <SYCL-2020-3.12.13_>`_
+notes the intent that the SYCL specification be implementable as a C++ library
+with no requirements beyond a compiler that conforms to the C++17 standard.
+The SYCL specification therefore does not specify extensions to the C++ core
+language and a library-only implementation will work with Clang without any
+core language extensions.
+Clang provides the features described in this section to facilitate
capabilities
+that are not possible with a library-only SYCL implementation.
+
+.. _SYCL-2020-3.12.13:
+
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_library_only_implementation
+
+
+.. _sect-sycl_kernel_entry_point:
+
+The ``[[clang::sycl_kernel_entry_point]]`` Attribute
+----------------------------------------------------
+This attribute is intended for use in the implementation of SYCL run-time
+libraries and should not be used directly by programmers.
+
+The `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute
+coordinates interaction between Clang and the SYCL run-time library to
+facilitate code generation and the execution of a SYCL kernel on a device
+that potentially uses an instruction set architecture different from the host.
+Consider the following call to the ``sycl::handler::single_task()`` SYCL
+kernel invocation function.
.. code-block:: C++
- // check that SYCL mode is ON and we can use non-standard decorations
- #if defined(__SYCL_DEVICE_ONLY__)
- // GPU/accelerator implementation
- template <typename T, address_space AS> class multi_ptr {
- // DecoratedType applies corresponding address space attribute to the
type T
- // DecoratedType<T, global_space>::type ==
"__attribute__((opencl_global)) T"
- // See sycl/include/CL/sycl/access/access.hpp for more details
- using pointer_t = typename DecoratedType<T, AS>::type *;
+ struct KN;
+ void f(sycl::handler &h, sycl::stream &sout, int i) {
+ h.single_task<KN>([=] {
+ sout << "The value of i is " << i << "\n";
+ });
+ }
+
+The SYCL kernel is defined by the lambda expression passed to the
+``single_task()`` function and is identified by the ``KN`` type passed as the
+first template argument.
+See
+`SYCL 2020 section 4.9.4.2, "SYCL functions for invoking kernels"
<SYCL-2020-4.9.4.2_>`_
+and
+`SYCL 2020 section 5.2, "Naming of kernels" <SYCL-2020-5.2_>`_
+for more details.
+
+The `SMCP`_ and `SSCP`_ compilation models require that code generation be
+performed for each SYCL kernel for each target device.
+In order for Clang to perform that code generation, it needs to be informed
+that a SYCL kernel invocation is present.
+The `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute
+provides the means for the SYCL run-time library, which provides the
+definition of all SYCL kernel invocation functions, to inform Clang of a SYCL
+kernel invocation.
+This is accomplished by, in the definition of a SYCL kernel invocation
function,
+including a call to a function declared with the attribute.
+For example:
+
+.. code-block:: C++
- pointer_t m_Pointer;
+ namespace sycl {
+ class handler {
+ template <typename KernelName, typename KernelType>
+ [[clang::sycl_kernel_entry_point(KernelName)]]
+ void kernel_entry_point(KernelType kernelFunc) {
+ kernelFunc();
+ }
public:
- pointer_t get() { return m_Pointer; }
- T& operator* () { return *reinterpret_cast<T*>(m_Pointer); }
+ template <typename KernelName, typename KernelType>
+ void single_task(const KernelType &kernelFunc) {
+ kernel_entry_point<KernelName>(kernelFunc);
+ }
+ };
}
- #else
- // CPU/host implementation
- template <typename T, address_space AS> class multi_ptr {
- T *m_Pointer; // regular undecorated pointer
- public:
- T *get() { return m_Pointer; }
- T& operator* () { return *m_Pointer; }
+
+The arguments of the call to ``kernel_entry_point()`` constitute the parameters
+of a SYCL kernel.
+The body of the ``kernel_entry_point()`` function contains the statements
+required to execute the SYCL kernel (e.g., an invocation of the call operator
of
+the SYCL kernel object as in this example).
+The call to ``kernel_entry_point()`` in ``single_task()`` establishes a common
+point of SYCL kernel invocation for Clang and the SYCL run-time library.
+With that point established, the tasks required to actually execute a SYCL
+kernel are delegated according to the following division of responsibilities.
+
+Clang is responsible for:
+
+- Validating that all kernel argument types (e.g., the deduced parameter types
+ of the ``kernel_entry_point()`` function above) satisfy the requirements
+ specified in
+ `SYCL 2020 section 4.12.4, "Rules for parameter passing to kernels"
<SYCL-2020-4.12.4_>`_.
+- Informing the SYCL run-time library of the presence of subobjects of SYCL
+ types that require special handling within kernel arguments.
+- Generating an offload kernel entry point function for each SYCL kernel for
+ each target device, generating a name for it derived from the SYCL kernel
+ name, and informing the SYCL run-time library of the generated name.
+
+The SYCL run-time library is responsible for:
+
+- Selecting a device on which to execute the kernel.
+- Copying the SYCL kernel object and any other kernel arguments to the device.
+- Informing Clang of additional parameters required for the offload kernel
+ entry point based on the presence of subobjects of SYCL types that require
+ special handling within kernel arguments.
+- Scheduling execution of the offload kernel entry point function on the
+ selected device.
+
+The SYCL run-time library tasks are expected to be performed in conjunction
+with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero;
+their details are out of scope for this document.
+
+The above division of responsibilities requires coordination.
+The call to a function declared with the
+`sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute causes
+two primary side effects:
+
+- The generation of an offload kernel entry point function.
+- An implicit call to a SYCL run-time library provided template named
+ ``sycl_kernel_launch`` (which may be a function template or a variable
+ template of a type with a member call operator).
+
+The offload kernel entry point function is generated with a target dependent
+calling convention for each device target.
+The function parameters and function body are initially copied from the
function
+declared with the `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_
+attribute, but may be augmented by information provided by the SYCL run-time
+library as described below.
+The function name is an implementation detail subject to change, but
+incorporates the SYCL kernel name in order to ensure that a unique name is
+deterministically generated for each SYCL kernel.
+
+The call to the ``sycl_kernel_launch`` template effectively replaces the call
+to the `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attributed
+function.
+This implicit call serves several purposes:
+
+- It informs the SYCL run-time library of the name of the offload kernel entry
+ point function to be used to execute the kernel on the selected device.
+- It informs the SYCL run-time library of the presence of subobjects of the
+ kernel arguments that require special handling.
+
+See the
+:ref:`sycl_special_kernel_parameter <sect-sycl_special_kernel_parameter>`
+section regarding SYCL types that require special handling in kernel arguments.
+
+The call to the ``sycl_kernel_launch`` template passes the name of the
generated
+offload kernel entry point function, the kernel arguments, and, for each
+subobject of a kernel argument that requires special handling, a reference to
+that subobject.
+For reasons explained further below, the ``sycl_kernel_launch`` template needs
+to know which of its arguments correspond to direct kernel arguments and which
+correspond to references to special subobjects within the direct kernel
+arguments.
+Because there may be multiple kernel arguments with multiple subobjects that
+require special handling, and because C++17 does not support function templates
+with multiple function parameter packs, an idiom is used to pass the kernel
+arguments and special subobjects separately.
+This idiom is best explained by way of an example.
+
+Consider the earlier example of a call to ``single_task()`` that passes a
+lambda that captures variables of type ``int`` and ``std::stream``.
+``std::stream`` is an example of a SYCL type that requires special handling in
+kernel arguments.
+The call to ``kernel_entry_point<KernelName>(kernelFunc)`` in the
implementation
+of ``single_task()`` results in an implicit call to ``sycl_kernel_launch`` that
+looks similar to the following (the access to the captured copy of ``sout`` via
+``kernelFunc.sout`` is not valid C++ syntax, but the compiler can generate such
+accesses).
+
+.. code-block:: C++
+
+ sycl_kernel_launch<KernelName>("kernel-entry-point",
kernelFunc)(kernelFunc.sout)
+
+The SYCL kernel name type, ``KernelName``, is passed as an explicit template
+type argument for convenient use by the SYCL run-time library if desired.
+The first function argument is the name of the offload kernel entry point
+function generated for the SYCL kernel denoted by ``KernelName``.
----------------
tahonermann wrote:
The direction I have in mind is to use the mangled name as known to the host
and to emit a table with the device that maps the host name to the name for
that device. This is future work. For now, we're relying on the same mangled
name being generated for the host and all devices. I'll resolve this comment
for now, but feel free to reopen it.
https://github.com/llvm/llvm-project/pull/170602
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits