Re: Notes on FiPy+Trilinos installation in Anaconda

Keller, Trevor Tue, 25 Aug 2015 10:09:05 -0700

After further tinkering, I have some additional comments on building 
FiPy+Trilinos on Linux. Some are reflected in the revised build guide, 
attached; some apply to running FiPy on the command line.



First, two corrections to the original build guide. While pytrilinos is 
compiled when you build Trilinos, it is not installed with the rest. An 
additional instruction is included to take care of that. Also, the original 
guide left off gist, because it was difficult to find the right one: there are 
many software packages by that name. The correct link is now included.


Next, while executing FiPy in parallel with Trilinos, numerous threads were 
being launched. I therefore recompiled Trilinos with OpenMP support explicitly 
enabled, and updated the build guide for the new flags. While the original 
guide kept the command on one line for easy copying, the revision splits it up 
for readability. Note that several detailed Trilinos build guides for HPC are 
out there; this one 
(https://redmine.scorec.rpi.edu/projects/albany-rpi/wiki/Installing_Trilinos) 
exposes many flags by example, which I found quite useful. I should state here 
that, while the Trilinos build command has grown, it may or may not have 
improved: FiPy uses a subset of Trilinos' capabilities, so building the entire 
tool is excessive. It will be worth optimizing which modules are built after 
studying how FiPy interacts with Trilinos in greater detail.


Finally, I solved the baffling mystery of the speedup problem. The culprit 
appears to be a bad interaction between Trilinos' greed for threads and 
Python's Global interpreter Lock. By default on my test computer, every MPI 
rank in the run spawns as many threads as there are cores available, which is 
unexpected. It could be that Trilinos is designed to have one MPI rank per node 
of a cluster, so each of the child threads would help with computation but 
would not compete for I/O resources during ghost cell exchanges and file I/O. 
However, Python's GIL binds all the child threads to the same core as their 
parent! So instead of improving performance, each core suffers a heavy overhead 
from the managing those idling threads.


The fix I've used is to quash threads using the OpenMP environmental variable. 
Before launching a FiPy script in parallel, export OMP_NUM_THREADS=1. The 
effect is quite dramatic (results for a 6-core Intel with 2x threading).


$ echo "solver  gridsz  avg time"; echo -n "serial"; python mpitest.py; echo -n 
"trilin"; python mpitest.py --trilinos; for n in {1..12}; do echo -n "np = 
${n}"; mpirun -np $n python mpitest.py --trilinos; done
solver  gridsz  avg time
serial  27900   0.130390
trilin  27900   0.941082
np = 1  27900   1.080579
np = 2  14973   1.717842
np = 3  10230   4.968959
np = 4   7958   5.462615
np = 5   6542   7.546303
np = 6   5463   8.831044
np = 7   4881   8.612932
np = 8   4292   8.786715
np = 9   4101  10.820657
np = 10  3714  12.011446
np = 11  3284  13.163548
np = 12  3012  14.257576

$ export OMP_NUM_THREADS=1; echo "solver  gridsz  avg time"; echo -n "serial"; 
python mpitest.py; echo -n "trilin"; python mpitest.py --trilinos; for n in 
{1..12}; do echo -n "np = ${n}"; mpirun -np $n python mpitest.py --trilinos; 
done
solver  gridsz avg time
serial  27900   0.139227
trilin  27900   0.284342
np = 1  27900   0.268329
np = 2  14973   0.157163
np = 3  10230   0.124317
np = 4   7958   0.091335
np = 5   6542   0.080261
np = 6   5463   0.067957
np = 7   4881   0.100654
np = 8   4292   0.085887
np = 9   4101   0.089979
np = 10  3714   0.083326
np = 11  3284   0.081005
np = 12  3012   0.076399


Now, that is more like it: runtimes monotonically decrease toward n=6, jump, 
then decrease (with some jitter) toward n=12. The revised Python script showing 
average runtimes for easier analysis & plotting is available on 
https://gist.github.com/tkphd/7f62afc064448ca80025.


Trevor Keller, Ph. D.

National Institute of Standards and Technology

100 Bureau Dr., MS 8550; Gaithersburg, MD 20899

Office: 223/A131 or (301) 975-2889

________________________________
From: [email protected] <[email protected]> on behalf of Warren, James 
A. <[email protected]>
Sent: Thursday, July 30, 2015 10:59 AM
To: FIPY
Subject: Re: Notes on FiPy+Trilinos installation in Anaconda

Trevor, this very valuable.  Many thanks.

Dr. James A Warren
Director, Materials Genome Program
Material Measurement Laboratory
National Institute of Standards and Technology
(301)-975-5708<tel:(301)-975-5708>
________________________________
On: 29 July 2015 12:52, "Keller, Trevor" <[email protected]> wrote:

Hello fellow FiPyers,


There are several bits and pieces mentioning FiPy and Trilinos here, in the 
user's manual, and around the Internet, but I haven't found a detailed guide to 
install FiPy with Trilinos from source. I recently gave it a try on Debian -- 
and it wasn't too bad. The whole process, which involved a couple false starts, 
took me a day and a half. It is my hope in publishing this bare-bones guide 
that fellow travelers can get it running on Linux in just a few hours. If you'd 
like to follow this guide, the standard disclaimer applies: you are downloading 
source code from the Internet and executing it on your machine. Proceed with 
the utmost caution.


My machine runs Debian GNU/Linux 7.8 "wheezy" on a 4-core Intel processor. I am 
not a sudoer, so a virtual environment was used: conda, from the Anaconda 
Python distribution. The various source codes were extracted and compiled 
within the ~/Downloads directory, then installed into the conda environment. 
Many of the Python programs are available directly through Anaconda, and should 
probably be installed through that package manager rather than from source. I 
chose to compile everything to (a) see if I could and (b) ensure compiler and 
library consistency.


The attached build notes indicate the order in which packages were installed, 
provide the home- or download-page URL for the software project, and specify 
the commands I used to compile and install the code. Installation of Anaconda 
is excluded, since I did not have to do it; commands to download, extract, and 
cd into the source code are excluded since the latest versions change almost 
daily.


After installation, I executed Daniel Wheeler's test script 
(https://gist.github.com/wd15/8717979) to confirm everything worked. This 
resulted in a couple of interesting things. First, FiPy as-is compares versions 
as strings and believes gmsh 2.10.0 is less than 2.5.0. I revised my local copy 
to compare versions as '.'-delimited tuples, which allowed the code to execute 
in parallel; I'm working on a patch. Second, the runtimes increased through 
np=4, then dropped at np=5 and beyond to values similar to Dr. Wheeler's 
(http://wd15.github.io/2014/01/30/fipy-trilinos-anaconda/). The machine has 
only four cores and is not hyperthreaded, so this was a surprise. It could be 
that timing differences for heavier workloads will follow expected scaling 
laws, but I am baffled by the result.


Results table (run type, MPI rank, grid size, time; modified script at 
https://gist.github.com/anonymous/43afb6c567ebfd6d3b2f):


$ echo -n "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py 
--trilinos; for n in {1..6}; do echo -n "np = ${n}"; mpirun -np $n python 
mpitest.py --trilinos; done
serial  0       27900   0.134404
trilin  0       27900   0.275607
np = 1  0       27900   0.664498
np = 2  0       14973   2.783230
        1       14973   2.797370
np = 3  0       10230   3.167559
        1       10480   3.174004
        2       10538   3.196746
np = 4  0       7958    5.139595
        1       7971    5.139695
        2       8106    5.139710
        3       7955    5.139925
np = 5  0       6542    0.499731
        1       6538    0.500179
        2       6534    0.500313
        3       6655    0.498517
        4       6689    0.500652
np = 6  0       5463    0.447847
        1       5584    0.448344
        2       5594    0.445663
        3       5512    0.449028
        4       5570    0.446170
        5       5571    0.449486


Thoughts on this anomalous behavior would be greatly appreciated!


Good luck,

Trevor

Recommended build order for dependencies and FiPy in a clean
Anaconda environment with OpenMPI and Trilinos support
Questions/comments to [email protected] (Trevor Keller)

$ conda create -n fipy python=2.7              # fipy is an alias, choose any 
name you wish
$ source activate fipy                         # use the alias you chose
$ export ANACONDA=/abs/path/to/conda/envs/fipy # append this to your .bashrc

Notes: make: -j4 issued due to 4-way parallel CPU, adjust for your hardware
       cmake: assumes mkdir build; cd build in every instance
              execute ccmake to interactively set flags
       configure: issue ./configure --help for options. Make sure DEBUG is 
disabled.
       Build shared libraries whenever possible.
       Trilinos has OpenMP support, which is thwarted by the Python GIL. For 
best performance,
       limit your environment to one thread per core before launching in 
parallel, e.g.
       $ export OMP_NUM_THREADS=1; mpirun -np 4 python mpitest.py

Build order:
  

cmake    http://www.cmake.org/download/
    ./bootstrap --prefix=$ANACONDA --parallel=4; gmake -j4 && gmake install
    export CMAKE_ROOT=$ANACONDA/share/cmake-3.3

Flex    http://sourceforge.net/projects/flex/files/
    ./configure --prefix=$ANACONDA; make -j4 && make install

Bison   http://www.gnu.org/software/bison/
    ./configure --prefix=$ANACONDA; make -j4 && make install

Pth http://www.gnu.org/software/pth/
    ./configure --prefix=$ANACONDA; make && make install # make -j4 failed

doxygen    http://www.stack.nl/~dimitri/doxygen/download.html#srcbin
    cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..
    make -j4 && make install

openmpi    http://www.open-mpi.org/software/ompi/v1.8/
    ./configure --prefix=$ANACONDA; make -j4 && make install
    
bzip2      http://www.bzip.org/downloads.html
    make -f Makefile-libbz2_so  # build shared library
    make PREFIX=$ANACONDA install

libboost    http://www.boost.org/users/download/
    ./bootstrap.sh --prefix=$ANACONDA
    # Add "using mpi ;" to project-config.jam  (I put it under "using gcc ;")
    ZLIB_LIBPATH=$ANACONDA/lib BZIP2_INCLUDE=$ANACONDA/include ./b2 
--prefix=$ANACONDA --with-mpi= variant=release link=shared install
    
glm        http://glm.g-truc.net/0.9.4/
    cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..
    make -j4 && make install

cython     http://cython.org/#download
    python setup.py install    

HDF5       https://www.hdfgroup.org/HDF5/release/cmakebuild.html
    # Edit HDF518config.cmake to enable shared libraries (search 
BUILD_SHARED_LIBS:BOOL=OFF)
    ./build-hdf518-unix.sh
    ./HDF5-1.8.15-patch1-Linux.sh # accept default path
    cd HDF5-1.8.15-patch1-Linux; for d in */; do cp -r $d $ANACONDA/; done

OpenBLAS   https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide
    make -j4 PREFIX=$ANACONDA FC=gfortran && make PREFIX=$ANACONDA install

numpy     http://sourceforge.net/projects/numpy/files/
    python setup.py install

scipy    http://sourceforge.net/projects/scipy/files/
    python setup.py install

pysparse    http://pysparse.sourceforge.net/
    python setup.py install
    
pyamg       https://github.com/pyamg/pyamg
    python setup.py install

mpi4py      https://bitbucket.org/mpi4py
    python setup.py install

netcdf      https://www.unidata.ucar.edu/downloads/netcdf/index.jsp
    # Use the stable C distribution. HDF5 support requires you build in 
parallel.
    CC=mpicc ./configure --prefix=$ANACONDA --with-zlib=$ANACONDA; make check 
install && make install
    cp build/include/netcdf_par.h $ANACONDA/include/ # for some reason, make 
install neglects this file. Trilinos needs it.

swig        http://www.swig.org/download.html
    ./configure --prefix=$ANACONDA; make -j4 && make install

fltk        http://www.fltk.org/software.php
    cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..; make 
-j4 && make install

mmg3d       http://www.math.u-bordeaux1.fr/~dobrzyns/logiciels/download.php
    cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..; make 
-j4 && make install

tetgen      http://wias-berlin.de/software/tetgen/#Download
    cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..; make 
-j4 && make install

gmsh        http://geuz.org/gmsh/
    for d in */; do cp -r $d $ANACONDA/; done # binary distribution
   
PETSc       http://www.mcs.anl.gov/petsc/download/
    # The configure script for PETSc will walk you through installation, 
    # with flags tailored to your architecture. Follow its lead.
    ./configure --prefix=$ANACONDA --with-mpi=1 --with-debugging=0 \
    COPTFLAGS='-O3 -march=native -mtune=native' \
    CXXOPTFLAGS='-O3 -march=native -mtune=native' \
    FOPTFLAGS='-O3 -march=native -mtune=native'
    make PETSC_DIR=. PETSC_ARCH=arch-linux2-c-opt all
    make PETSC_DIR=. PETSC_ARCH=arch-linux2-c-opt install

petsc4py    https://bitbucket.org/petsc/petsc4py
    # The variables PETSC_DIR and PETSC_ARCH should match your PETSc make 
command
    PETSC_DIR=$ANACONDA PETSC_ARCH=arch-linux2-c-opt python setup.py install

trilinos    https://trilinos.org/download/ or github
    CMAKE_PREFIX_PATH=$ANACONDA cmake    -DCMAKE_BUILD_TYPE:STRING=RELEASE \
    -DTrilinos_ENABLE_PyTrilinos:BOOL=ON -DTrilinos_ENABLE_OpenMP:BOOL=ON \
    -DKokkos_ENABLE_OpenMP:BOOL=ON       -DKokkos_ENABLE_Serial:BOOL=OFF \
    -DTrilinos_ENABLE_STK:BOOL=OFF       -DBUILD_SHARED_LIBS:BOOL=ON \    
    -DTPL_ENABLE_MPI:BOOL=ON             -DMPI_BASE_DIR:PATH=$ANACONDA \
    -DTPL_ENABLE_Pthread:BOOL=ON         -DDART_TESTING_TIMEOUT:STRING=600 \
    -DTrilinos_ENABLE_TESTS:BOOL=ON      -DCMAKE_INSTALL_PREFIX:PATH=$ANACONDA \
    -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON \
    -DTrilinos_ASSERT_MISSING_PACKAGES=OFF \
    -DPyTrilinos_INSTALL_PREFIX:PATH=$ANACONDA  ..
    make -j4 && make install
    cd pytrilinos; python setup.py install

LSMLIB     http://ktchu.serendipityresearch.org/software/lsmlib/index.html
    ./configure --prefix=$ANACONDA; make -j4 && make install
    
scikit-fmm https://github.com/scikit-fmm/scikit-fmm
    python setup.py install

gist       https://bitbucket.org/dpgrote/pygist
    python setup.py config && python setup.py install
     
fipy       https://github.com/usnistgov/fipy 
    python setup.py install

_______________________________________________
fipy mailing list
[email protected]
http://www.ctcms.nist.gov/fipy
  [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]

Re: Notes on FiPy+Trilinos installation in Anaconda

Reply via email to