I don't think Joris is saying we should do this instead of PCHs, I think
he's just saying that it seems that a sizable performance improvement
can be made just by organizing Stout like a normal, not-header-only
library.

Taking stock briefly, based on what you've said here and elsewhere, it
seems that the main advantages of PCHs are:

1. Should speed up even "totally clean", non-ccached builds.
2. Removes the cost of parsing and compiling header files, and replaces
   it with the cost of accessing the cached version.
3. Works natively cross-platform, with no external tooling needed.

Is this correct? Have I left anything out?

If so, then it seems that the answer to Neil's question is that it
largely depends on how much time is spent parsing and compiling various
headers, no? If Stout were to be reorganized into a non-header-only
library, then we might reduce the cost of parsing, but if it is still
non-trivial, then it seems we would expect PCHs to improve the compile
times anyway.

If that's all right, then it seems difficult to answer Neil's question
directly.


__
Transcribed by my voice-enabled refrigerator, please pardon chilly messages.

On Wed, 15 Feb 2017, Jeff Coffler wrote:

Yes, but this should be dramatically sped up with precompiled headers. Yes, 
scanning the headers takes (a lot) of time. But if you only scan them once due 
to precompiled headers, it no longer matters.

I don't care if it takes 10 or even 15 or 20 seconds to scan all the headers. 
If you only do it once, the compile time is sped up dramatically.

If you're not doing PCH work, then I guess it could make sense to simplify the 
headers to the greatest extent possible. But you can only do that so much; with 
C++ template use for example, sometimes the implementation MUST be in header 
files.

/Jeff

-----Original Message-----
From: Joris Van Remoortere [mailto:jo...@mesosphere.io]
Sent: Wednesday, February 15, 2017 9:46 AM
To: dev@mesos.apache.org
Subject: Re: Proposal for Mesos Build Improvements


However, the non-header-only work won't do anything in a "clean build"
scenario.

I don't think this is true.

If you look at how many independent .o files we build that scan those headers 
each time it should be clear that reducing the complexity of the header file 
reduces the compile time.
A good example of heave .o files are the mesos tests that scan close to all of 
stout / libprocess for each test file.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Feb 14, 2017 at 4:49 PM, Jeff Coffler < 
jeff.coff...@microsoft.com.invalid> wrote:

Hi Neil,

This was discussed in the CXX Mesos Slack channel yesterday.

Basically, the two are separate and independent. Regardless of stout
work, I anticipate that PCH work will dramatically speed up the
Windows build (and Linux too, although I have less experience in that
area). I'm going to run some benchmarks on a subset of the code to give a good 
"before/after"
idea of the speedup and report to the list.

If stout non-header-only library work is done, this will do a fair
amount to speed up incremental builds (i.e. you just update
implementation of a stout method, and only the related C file is
rebuilt). However, the non-header-only work won't do anything in a
"clean build" scenario. And, if course, if you change the interface of
a stout method, all bets are off and you get to rebuild virtually the world.

PCH, on the other hand, will speed up all compiles across the board
(using stout and not using stout). Now, that said, if a stout change
is made (assuming still header-only), you will still rebuild
everything, but the builds will go much faster. That *may* be fast
enough to take the sting out of significant stout changes, but
changing stout will still help the incremental build cases regardless.

Hope that clarifies,

/Jeff

-----Original Message-----
From: Neil Conway [mailto:neil.con...@gmail.com]
Sent: Tuesday, February 14, 2017 11:45 AM
To: dev <dev@mesos.apache.org>
Subject: Re: Proposal for Mesos Build Improvements

I'm curious to hear more about how using PCH compares with making
stout a non-header-only library. Is PCH easier to implement, or is it
expected to offer a more dramatic improvement in compile times? Would
making both changes eventually make sense?

Neil

On Tue, Feb 14, 2017 at 11:28 AM, Jeff Coffler
<jeff.coff...@microsoft.com .invalid> wrote:
Proposal For Build Improvements

The Mesos build process is in dire need of some build infrastructure
improvements. These improvements will improve speed and ease of work
in particular components, and dramatically improve overall build time,
especially in the Windows environment, but likely in the Linux
environment as well.


Background:

It is currently recommended to use the ccache project with the Mesos
build process. This makes the Linux build process more tolerable in
terms of speed, but unfortunately such software is not available on Windows.
Ultimately, though, the caching software is covering up two
fundamental flaws in the overall build process:

1. Lack of use of libraries
2. Lack of precompiled headers

By not allowing use of libraries, the overall build process is often
much longer, particularly when a lot of work is being done in a
particular component. If work is being done in a particular component,
only that library need be rebuilt (and then the overall image
relinked). Currently, since there is no such modularization, all
source files must be considered at build time. Interestingly enough,
there is such modularization in the source code layout; that
modularization just isn't utilized at the compiler level.

Precompiled headers exist on both Windows and Linux. For Linux, you
can
refer to https://na01.safelinks.protection.outlook.com/?url=
https%3A%2F%2Fgcc.gnu.org%2Fonlinedocs%2Fgcc%2FPrecompiled-Headers.htm
l&
data=02%7C01%7CJeff.Coffler%40microsoft.com%7Cf0dfa7d79e6e43d31fa008d4
5512 0381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
7C636226983234972044&sdata=ljS8BJ9ZSI7Wqvk5%2Bv1oPH5c6tHZGg7FPb08nUN8J
Uc% 3D&reserved=0. Straight from the GNU CC documentation: "The time
the compiler takes to process these header files over and over again
can account for nearly all of the time required to build the project."

In my prior use of precompiled headers, each C or C++ file generally
took about 4 seconds to compile. After switching to precompiled
headers, the precompiled header creation took about 4 seconds, but
each C/C++ file now took about 200 milliseconds to compile. The
overall build speed was thus dramatically reduced.


Scope of Changes:

These changes are only being proposed for the CMake system. Going
forward, the CMake system is the easiest way to maintain some level of
portability between the Linux and Windows platforms.


Details for Modularization:

For the modularization, the intent is to simply make each source
directory of files, if functionally separate, to be compiled into an
archive (.a) file. These archive files will then be linked together to
form the actual executables. These changes will primarily be in the
CMake system, and should have limited effect on any actual source code.

At a later date, if it makes sense, we can look at building shared
library (.so) files. However, this only makes the most sense if the
code is truly shared between different executable files. If that's not
the case, then it likely makes sense just to stick with .a files.
Regardless, generation of .so files is out of scope for this change.


Details for Precompiled Header Changes:

Precompiled headers will make use of stout (a very large header-only
library) essentially "free" from a compile-time overhead point of view.
Basically, precompiled headers will take a list of header files
(including very long header files, like "windows.h"), and generate the
compiler memory structures for their representation.

During precompiled header generation, these memory structures are
flushed to disk. Then, when components are built, the memory
structures are reloaded from disk, which is dramatically faster than
actually parsing the tens of thousands of lines of header files and
building the memory structures.

For precompiled headers to be useful, a relatively "consistent" set
of
headers must be included by all of the C/C++ files. So, for example,
consider the following C file:

#if defined(windows)
#include <windows.h>
#endif

#include <header-a>
#include <header-b>
#include <header-c>

< - Remainder of module - >

To make a precompiled header for this module, all of the #include
files
would be included in a new file, mesos_common.h. The C file would then
be changed as follows:

#include "mesos_common.h"

< - Remainder of module - >

Structurally, the code is identical, and need not be built with
precompiled headers. However, use of precompiled headers will make
file compilation dramatically faster.

Note that other include files can be included after the precompiled
header if appropriate. For example, the following is valid:

#include "mesos_common.h"
#inclue <header-d>

< - Remainder of module - >

For efficiency purposes, if a header file is included by 50% or more
of
the source files, it should be included in the precompiled header. If
a header is included in fewer than 50% of the source files, then it
can be separately included (and thus would not benefit from precompiled 
headers).
Note that this is a guideline; even if a header is used by less than
50% of source files, if it's very large, we still may decide to throw
it in the precompiled header.

Note that, for use of precompiled headers, there will be a great
deal of code churn (almost exclusively in the #include list of
source files). This will mean that there will be a lot of code
merges, but ultimately no "code logic" changes. If merges are not
done in a timely fashion, this can easily result in needless hand merging of 
changes.
Due to these issues, we will need a dedicated sheppard that will
integrate the patches quickly. This kind of work is easily
invalidated when the include list is changed by another developer,
necessitating us to redo the patch. [Note that Joseph has stepped up
to the plate for this, thanks Joseph!]


This is the end of my proposal, feedback would be appreciated.

Reply via email to