Scott Meyers has blogged a few times about his experience publishing technical 
books to ebook formats, and a number of times the subject of formatting code 
for e-readers has come up. The quite obvious solution is automatic code 
formatting and there have been several commenters to whom clang-format 
immediately suggested itself. I decided it sounded like a fun evening project 
so tonight that's what I did, and I thought I'd share what it took to get my 
initial working example.

I started with my existing LLVM build environment, which already has LLVM, 
compiler-rt, libcxx, lld, clang, and the clang tools including clang-format set 
up appropriately for building from source. I use CMake/Ninja and have a 
buildbot set up with OS X and Windows slaves to automate daily builds and test 
runs. So from there I grabbed the latest release of Emscripten, the C++ to 
Javascript compiler (which coincidentally uses LLVM as a backend), and followed 
the instructions to set up the 'portable' install for OS X. After wondering for 
a bit why Emscripten is so adamant that the python executable be named 
'python2', I finished the setup and was able to build a hello-world.cpp program 
and run it in a browser.

After that I set up a new CMake build directory for the Emscripten build of 
clang-format to go in. It took a few tries, but the magic incantation to 
produce a functioning build involved using Emscripten's binaries for C++ 
compiler, C compiler, ar, and ranlib. Overriding the default linker was not 
needed and in fact stops the CMake configure from working. I was also require 
to set C++11 mode using CMAKE_CXX_FLAGS, and I disabled a warning here as well 
to cut down on the noise. I chose to configure a release build. The complete 
CMake invocation I used was:

cmake -DCMAKE_CXX_FLAGS="-std=c++11 -Wno-warn-absolute-paths" 
-DCMAKE_CXX_COMPILER=<emscripten_binary_path>/emcc 
-DCMAKE_C_COMPILER=<emscripten_binary_path>/emcc 
-DCMAKE_AR=<emscripten_binary_path>/emar 
-DCMAKE_RANLIB=<emscripten_binary_path>/emranlib -DCMAKE_BUILD_TYPE=release -G 
Ninja <path_to_my_existing_llvm_source_tree>

I didn't bother with this, but adding -DCMAKE_C_FLAGS="-Wno-absolute-paths" 
might also be good, to cut out the last few warnings.

Additionally I had to make one change to the CMakeLists.txt file in 
compiler-rt, where it was complaining about requiring a pointer size of 4 or 8 
bytes. I simply commented out the error line in the CMakeLists.txt.

At this point CMake was successfully configuring a build directory, and I was 
able to kick off a build with 'ninja clang-format'.

The next issue was that LLVM's build process involves producing executables 
that then actually have to be run as part of the build. This was easy enough to 
get around by the simple expedient of using the executables from my normal, 
non-Emscripten build. After a build step requiring an executable would fail, 
causing the build to stop, I would copy the appropriate executable from my 
regular build area into the Emscripten build area. I also needed to set execute 
permissions on the copied executables. After that I would restart the build 
with another "ninja clang-format" invocation. There were only two restarts 
required, and the two executables needed were llvm-tblgen and clang-tblgen.

The build then completed, producing a file 'clang-format' containing LLVM 
bitcode. Emscripten's compiler, emcc, requires a file extension to figure out 
what kind of file it is in order to figure out what to do with it, so I renamed 
the file to 'clang-format.o'. emcc also uses a file extension on the output 
file to figure out what to produce. If you ask emcc to produce an html file 
emcc will create an web page from a template, and the page is set to 
automatically load and run the final javascript program.

I found that trying to use stdin in the final program produces an endless 
series of dialogs asking for input in the web browser (So be sure not to load 
up such an html page in a browser like Safari which lacks a handy "Prevent this 
web page from spawning more dialogs" button). In order to avoid stdin, I used 
emcc's preload-file feature to put files into a virtual filesystem available to 
the running javascript program. The final emcc invocation looked like this:

emcc clang-format.o -o blah.html --preload-file main.cpp

emcc produced a few 'unresolved symbol' warnings, but still generated runnable 
javascript.

In order to get clang-format to actually look at the loaded file I had to 
modify the generated html file in order to pass command line arguments to 
clang-format. This involved finding the var 'Module' and adding an 'arguments' 
parameter. I added it between the preRun and postRun members:

      var Module = {
        preRun: [],
        arguments: ['main.cpp'], // <--- added this line
        postRun: [],


And the final result:

http://i.imgur.com/x3xgpK9.png

All in all it took about 3 hours I think, and the experience getting Emscripten 
to build the necessary parts of LLVM using CMake was pretty smooth. The 
resulting javascript file is ~20MB, which seems a bit heavy to include in an 
ebook, but I think this still indicates that this could be a realistic solution 
to the problem of publishing code samples in a dynamic format.

- Seth
_______________________________________________
cfe-users mailing list
cfe-users@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users

Reply via email to