Author: dgregor
Date: Fri Mar 22 02:05:07 2013
New Revision: 177706
URL: http://llvm.org/viewvc/llvm-project?rev=177706&view=rev
Log:
More modules documentation, including the straw-man import declaration syntax
and "how to modularize a platform".
Modified:
cfe/trunk/docs/Modules.rst
Modified: cfe/trunk/docs/Modules.rst
URL:
http://llvm.org/viewvc/llvm-project/cfe/trunk/docs/Modules.rst?rev=177706&r1=177705&r2=177706&view=diff
==============================================================================
--- cfe/trunk/docs/Modules.rst (original)
+++ cfe/trunk/docs/Modules.rst Fri Mar 22 02:05:07 2013
@@ -21,7 +21,7 @@ The implementation is handled separately
Modules provide an alternative, simpler way to use software libraries that
provides better compile-time scalability and eliminates many of the problems
inherent to using the C preprocessor to access the API of a library.
-Problems with the Current Model
+Problems with the current model
-------------------------------
The ``#include`` mechanism provided by the C preprocessor is a very poor way
to access the API of a library, for a number of reasons:
@@ -73,7 +73,7 @@ The ``#include`` mechanism provided by t
API, and what declarations are present only because they had to be
written as part of the header file?
-Semantic Import
+Semantic import
---------------
Modules improve access to the API of software libraries by replacing the
textual preprocessor inclusion model with a more robust, more efficient
semantic model. From the user's perspective, the code looks only slightly
different, because one uses an ``import`` declaration rather than a
``#include`` preprocessor directive:
@@ -90,7 +90,7 @@ This semantic import model addresses man
* **Tool confusion**: Modules describe the API of software libraries, and
tools can reason about and present a module as a representation of that API.
Because modules can only be built standalone, tools can rely on the module
definition to ensure that they get the complete API for the library. Moreover,
modules can specify which languages they work with, so, e.g., one can not
accidentally attempt to load a C++ module into a C program.
-Problems Modules Do Not Solve
+Problems modules do not solve
-----------------------------
Many programming languages have a module or package system, and because of the
variety of features provided by these languages it is important to define what
modules do *not* do. In particular, all of the following are considered
out-of-scope for modules:
@@ -104,11 +104,30 @@ Many programming languages have a module
Using Modules
=============
-To enable modules, pass the command-line flag ``-fmodules`` [#]_. This will
make any modules-enabled software libraries available as modules as well as
introducing any modules-specific syntax. Additional command-line parameters are
described later.
+To enable modules, pass the command-line flag ``-fmodules`` [#]_. This will
make any modules-enabled software libraries available as modules as well as
introducing any modules-specific syntax. Additional `command-line parameters`_
are described in a separate section later.
-Includes as Imports
+Import declaration
+------------------
+The most direct way to import a module is with an *import declaration*, which
imports the named module:
+
+.. parsed-literal::
+
+ import std;
+
+The import declaration above imports the entire contents of the ``std`` module
(which would contain, e.g., the entire C or C++ standard library) and make its
API available within the current translation unit. To import only part of a
module, one may use dot syntax to specific a particular submodule, e.g.,
+
+.. parsed-literal::
+
+ import std.io;
+
+Redundant import declarations are ignored, and one is free to import modules
at any point within the translation unit, so long as the import declaration is
at global scope.
+
+.. warning::
+ The import declaration syntax described here does not actually exist.
Rather, it is a straw man proposal that may very well change when modules are
discussed in the C and C++ committees. See the section `Includes as imports`_
to see how modules get imported today.
+
+Includes as imports
-------------------
-The primary user-level feature of modules is the import operation, which
provides access to the API of software libraries. However, Clang does not
provide a specific syntax for importing modules within the language itself
[#]_. Instead, Clang translates ``#include`` directives into the corresponding
module import. For example, the include directive
+The primary user-level feature of modules is the import operation, which
provides access to the API of software libraries. However, today's programs
make extensive use of ``#include``, and it is unrealistic to assume that all of
this code will change overnight. Instead, modules automatically translate
``#include`` directives into the corresponding module import. For example, the
include directive
.. code-block:: c
@@ -116,15 +135,23 @@ The primary user-level feature of module
will be automatically mapped to an import of the module ``std.io``. Even with
specific ``import`` syntax in the language, this particular feature is
important for both adoption and backward compatibility: automatic translation
of ``#include`` to ``import`` allows an application to get the benefits of
modules (for all modules-enabled libraries) without any changes to the
application itself. Thus, users can easily use modules with one compiler while
falling back to the preprocessor-inclusion mechanism with other compilers.
-Module Maps
+.. note::
+
+ The automatic mapping of ``#include`` to ``import`` also solves an
implementation problem: importing a module with a definition of some entity
(say, a ``struct Point``) and then parsing a header containing another
definition of ``struct Point`` would cause a redefinition error, even if it is
the same ``struct Point``. By mapping ``#include`` to ``import``, the compiler
can guarantee that it always sees just the already-parsed definition from the
module.
+
+Module maps
-----------
The crucial link between modules and headers is described by a *module map*,
which describes how a collection of existing headers maps on to the (logical)
structure of a module. For example, one could imagine a module ``std`` covering
the C standard library. Each of the C standard library headers (``<stdio.h>``,
``<stdlib.h>``, ``<math.h>``, etc.) would contribute to the ``std`` module, by
placing their respective APIs into the corresponding submodule (``std.io``,
``std.lib``, ``std.math``, etc.). Having a list of the headers that are part of
the ``std`` module allows the compiler to build the ``std`` module as a
standalone entity, and having the mapping from header names to (sub)modules
allows the automatic translation of ``#include`` directives to module imports.
-Module maps are specified as separate files (each named ``module.map``)
alongside the headers they describe, which allows them to be added to existing
software libraries without having to change the library headers themselves (in
most cases [#]_). The actual `Module Map Language`_ is described in a later
section.
+Module maps are specified as separate files (each named ``module.map``)
alongside the headers they describe, which allows them to be added to existing
software libraries without having to change the library headers themselves (in
most cases [#]_). The actual `Module map language`_ is described in a later
section.
+
+.. note::
+
+ To actually see any benefits from modules, one first has to introduce module
maps for the underlying C standard library and the libraries and headers on
which it depends. The section `Modularizing a Platform`_ describes the steps
one must take to write these module maps.
-Compilation Model
+Compilation model
-----------------
-The binary representation of modules is automatically generated by the
compiler on an as-needed basis. When a module is imported (e.g., by an
``#include`` of one of the module's headers), the compiler will spawn a second
instance of itself, with a fresh preprocessing context [#]_, to parse just the
headers in that module. The resulting Abstract Syntax Tree (AST) is then
persisted into the binary representation of the module that is then loaded into
translation unit where the module import was encountered.
+The binary representation of modules is automatically generated by the
compiler on an as-needed basis. When a module is imported (e.g., by an
``#include`` of one of the module's headers), the compiler will spawn a second
instance of itself [#]_, with a fresh preprocessing context [#]_, to parse just
the headers in that module. The resulting Abstract Syntax Tree (AST) is then
persisted into the binary representation of the module that is then loaded into
translation unit where the module import was encountered.
The binary representation of modules is persisted in the *module cache*.
Imports of a module will first query the module cache and, if a binary
representation of the required module is already available, will load that
representation directly. Thus, a module's headers will only be parsed once per
language configuration, rather than once per translation unit that uses the
module.
@@ -187,7 +214,7 @@ As an example, the module map file for t
Here, the top-level module ``std`` encompasses the whole C standard library.
It has a number of submodules containing different parts of the standard
library: ``complex`` for complex numbers, ``ctype`` for character types, etc.
Each submodule lists one of more headers that provide the contents for that
submodule. Finally, the ``export *`` command specifies that anything included
by that submodule will be automatically re-exported.
-Lexical Structure
+Lexical structure
-----------------
Module map files use a simplified form of the C99 lexer, with the same rules
for identifiers, tokens, string literals, ``/* */`` and ``//`` comments. The
module map language has the following reserved words; all other C identifiers
are valid identifiers.
@@ -198,8 +225,8 @@ Module map files use a simplified form o
``exclude`` ``header`` ``umbrella``
``explicit`` ``link``
-Module Map Files
-----------------
+Module map file
+---------------
A module map file consists of a series of module declarations:
.. parsed-literal::
@@ -214,8 +241,8 @@ Within a module map file, modules are re
*module-id*:
*identifier* (',' *identifier*)*
-Module Declarations
--------------------
+Module declaration
+------------------
A module declaration describes a module, including the headers that contribute
to that module, its submodules, and other aspects of the module.
.. parsed-literal::
@@ -254,7 +281,7 @@ Modules can have a number of different k
*config-macros-declaration*
*conflict-declaration*
-Requires Declaration
+Requires declaration
~~~~~~~~~~~~~~~~~~~~
A *requires-declaration* specifies the requirements that an importing
translation unit must satisfy to use the module.
@@ -316,7 +343,7 @@ tls
}
}
-Header Declaration
+Header declaration
~~~~~~~~~~~~~~~~~~
A header declaration specifies that a particular header is associated with the
enclosing module.
@@ -348,7 +375,7 @@ A header with the ``exclude`` specifier
A given header shall not be referenced by more than one *header-declaration*.
-Umbrella Directory Declaration
+Umbrella directory declaration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An umbrella directory declaration specifies that all of the headers in the
specified directory should be included within the module.
@@ -366,7 +393,7 @@ An *umbrella-dir-declaration* shall not
Umbrella directories are useful for libraries that have a large number of
headers but do not have an umbrella header.
-Submodule Declaration
+Submodule declaration
~~~~~~~~~~~~~~~~~~~~~
Submodule declarations describe modules that are nested within their enclosing
module.
@@ -427,7 +454,7 @@ is equivalent to the (more verbose) modu
}
}
-Export Declaration
+Export declaration
~~~~~~~~~~~~~~~~~~
An *export-declaration* specifies which imported modules will automatically be
re-exported as part of a given module's API.
@@ -485,7 +512,7 @@ Note that, if ``Derived.h`` includes ``B
compatibility for programs that rely on transitive inclusion (i.e.,
all of them).
-Link Declaration
+Link declaration
~~~~~~~~~~~~~~~~
A *link-declaration* specifies a library or framework against which a program
should be linked if the enclosing module is imported in any translation unit in
that program.
@@ -505,8 +532,8 @@ A *link-declaration* with the ``framewor
format and the linker. The notion is similar to Microsoft Visual
Studio's ``#pragma comment(lib...)``.
-Configation Macros Declaration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Configuration macros declaration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The *config-macros-declaration* specifies the set of configuration macros that
have an effect on the the API of the enclosing module.
.. parsed-literal::
@@ -551,7 +578,7 @@ A translation unit shall not import the
config_macros [exhaustive] NDEBUG
}
-Conflict Declarations
+Conflict declarations
~~~~~~~~~~~~~~~~~~~~~
A *conflict-declaration* describes a case where the presence of two different
modules in the same translation unit is likely to cause a problem. For example,
two modules may provide similar-but-incompatible functionality.
@@ -599,6 +626,35 @@ Attributes are used in a number of place
Any *identifier* can be used as an attribute, and each declaration specifies
what attributes can be applied to it.
+Modularizing a Platform
+=======================
+To get any benefit out of modules, one needs to introduce module maps for
software libraries starting at the bottom of the stack. This typically means
introducing a module map covering the operating system's headers and the C
standard library headers (in ``/usr/include``, for a Unix system).
+
+The module maps will be written using the `module map language`_, which
provides the tools necessary to describe the mapping between headers and
modules. Because the set of headers differs from one system to the next, the
module map will likely have to be somewhat customized for, e.g., a particular
distribution and version of the operating system. Moreover, the system headers
themselves may require some modification, if they exhibit any anti-patterns
that break modules. Such common patterns are described below.
+
+**Macro-guarded copy-and-pasted definitions**
+ System headers vend core types such as ``size_t`` for users. These types are
often needed in a number of system headers, and are almost trivial to write.
Hence, it is fairly common to see a definition such as the following
copy-and-pasted throughout the headers:
+
+ .. parsed-literal::
+
+ #ifndef _SIZE_T
+ #define _SIZE_T
+ typedef __SIZE_TYPE__ size_t;
+ #endif
+
+ Unfortunately, when modules compiles all of the C library headers together
into a single module, only the first actual type definition of ``size_t`` will
be visible, and then only in the submodule corresponding to the lucky first
header. Any other headers that have copy-and-pasted versions of this pattern
will *not* have a definition of ``size_t``. Importing the submodule
corresponding to one of those headers will therefore not yield ``size_t`` as
part of the API, because it wasn't there when the header was parsed. The fix
for this problem is either to pull the copied declarations into a common header
that gets included everywhere ``size_t`` is part of the API, or to eliminate
the ``#ifndef`` and redefine the ``size_t`` type. The latter works for C++
headers and C11, but will cause an error for non-modules C90/C99, where
redefinition of ``typedefs`` is not permitted.
+
+**Conflicting definitions**
+ Different system headers may provide conflicting definitions for various
macros, functions, or types. These conflicting definitions don't tend to cause
problems in a pre-modules world unless someone happens to include both headers
in one translation unit. Since the fix is often simply "don't do that", such
problems persist. Modules requires that the conflicting definitions be
eliminated or that they be placed in separate modules (the former is generally
the better answer).
+
+**Missing includes**
+ Headers are often missing ``#include`` directives for headers that they
actually depend on. As with the problem of conflicting definitions, this only
affects unlucky users who don't happen to include headers in the right order.
With modules, the headers of a particular module will be parsed in isolation,
so the module may fail to build if there are missing includes.
+
+**Headers that vend multiple APIs at different times**
+ Some systems have headers that contain a number of different kinds of API
definitions, only some of which are made available with a given include. For
example, the header may vend ``size_t`` only when the macro ``__need_size_t``
is defined before that header is included, and also vend ``wchar_t`` only when
the macro ``__need_wchar_t`` is defined. Such headers are often included many
times in a single translation unit, and will have no include guards. There is
no sane way to map this header to a submodule. One can either eliminate the
header (e.g., by splitting it into separate headers, one per actual API) or
simply ``exclude`` it in the module map.
+
+To detect and help address some of these problems, the ``clang-tools-extra``
repository contains a ``modularize`` tool that parses a set of given headers
and attempts to detect these problems and produce a report. See the tool's
in-source documentation for information on how to check your system or library
headers.
+
Where To Learn More About Modules
=================================
The Clang source code provides additional information about modules:
@@ -609,18 +665,24 @@ The Clang source code provides additiona
``clang/test/Modules/``
Tests specifically related to modules functionality.
+``clang/include/clang/Basic/Module.h``
+ The ``Module`` class in this header describes a module, and is used
throughout the compiler to implement modules.
+
+``clang/include/clang/Lex/ModuleMap.h``
+ The ``ModuleMap`` class in this header describes the full module map,
consisting of all of the module map files that have been parsed, and providing
facilities for looking up module maps and mapping between modules and headers
(in both directions).
+
PCHInternals_
- Information about the serialized AST format used for precompiled headers and
modules.
+ Information about the serialized AST format used for precompiled headers and
modules. The actual implementation is in the ``clangSerialization`` library.
.. [#] Automatic linking against the libraries of modules requires specific
linker support, which is not widely available.
.. [#] Modules are only available in C and Objective-C; a separate flag
``-fcxx-modules`` enables modules support for C++, which is even more
experimental and broken.
-.. [#] The ``import modulename;`` syntax described earlier in the document is
a straw man proposal. Actual syntax will be pursued within the C++ committee
and implemented in Clang.
+.. [#] There are certain anti-patterns that occur in headers, particularly
system headers, that cause problems for modules. The section `Modularizing a
Platform`_ describes some of them.
-.. [#] There are certain anti-patterns that occur in headers, particularly
system headers, that cause problems for modules.
+.. [#] The second instance is actually a new thread within the current
process, not a separate process. However, the original compiler instance is
blocked on the execution of this thread.
-.. [#] The preprocessing context in which the modules are parsed is actually
dependent on the command-line options provided to the compiler, including the
language dialect and any ``-D`` options. However, the compiled modules for
different command-line options are kept distinct, and any preprocessor
directives that occur within the translation unit are ignored.
+.. [#] The preprocessing context in which the modules are parsed is actually
dependent on the command-line options provided to the compiler, including the
language dialect and any ``-D`` options. However, the compiled modules for
different command-line options are kept distinct, and any preprocessor
directives that occur within the translation unit are ignored. See the section
on the `Configuration macros declaration`_ for more information.
.. _PCHInternals: PCHInternals.html
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits