After further tinkering, I have some additional comments on building FiPy+Trilinos on Linux. Some are reflected in the revised build guide, attached; some apply to running FiPy on the command line.
First, two corrections to the original build guide. While pytrilinos is compiled when you build Trilinos, it is not installed with the rest. An additional instruction is included to take care of that. Also, the original guide left off gist, because it was difficult to find the right one: there are many software packages by that name. The correct link is now included. Next, while executing FiPy in parallel with Trilinos, numerous threads were being launched. I therefore recompiled Trilinos with OpenMP support explicitly enabled, and updated the build guide for the new flags. While the original guide kept the command on one line for easy copying, the revision splits it up for readability. Note that several detailed Trilinos build guides for HPC are out there; this one (https://redmine.scorec.rpi.edu/projects/albany-rpi/wiki/Installing_Trilinos) exposes many flags by example, which I found quite useful. I should state here that, while the Trilinos build command has grown, it may or may not have improved: FiPy uses a subset of Trilinos' capabilities, so building the entire tool is excessive. It will be worth optimizing which modules are built after studying how FiPy interacts with Trilinos in greater detail. Finally, I solved the baffling mystery of the speedup problem. The culprit appears to be a bad interaction between Trilinos' greed for threads and Python's Global interpreter Lock. By default on my test computer, every MPI rank in the run spawns as many threads as there are cores available, which is unexpected. It could be that Trilinos is designed to have one MPI rank per node of a cluster, so each of the child threads would help with computation but would not compete for I/O resources during ghost cell exchanges and file I/O. However, Python's GIL binds all the child threads to the same core as their parent! So instead of improving performance, each core suffers a heavy overhead from the managing those idling threads. The fix I've used is to quash threads using the OpenMP environmental variable. Before launching a FiPy script in parallel, export OMP_NUM_THREADS=1. The effect is quite dramatic (results for a 6-core Intel with 2x threading). $ echo "solver gridsz avg time"; echo -n "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py --trilinos; for n in {1..12}; do echo -n "np = ${n}"; mpirun -np $n python mpitest.py --trilinos; done solver gridsz avg time serial 27900 0.130390 trilin 27900 0.941082 np = 1 27900 1.080579 np = 2 14973 1.717842 np = 3 10230 4.968959 np = 4 7958 5.462615 np = 5 6542 7.546303 np = 6 5463 8.831044 np = 7 4881 8.612932 np = 8 4292 8.786715 np = 9 4101 10.820657 np = 10 3714 12.011446 np = 11 3284 13.163548 np = 12 3012 14.257576 $ export OMP_NUM_THREADS=1; echo "solver gridsz avg time"; echo -n "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py --trilinos; for n in {1..12}; do echo -n "np = ${n}"; mpirun -np $n python mpitest.py --trilinos; done solver gridsz avg time serial 27900 0.139227 trilin 27900 0.284342 np = 1 27900 0.268329 np = 2 14973 0.157163 np = 3 10230 0.124317 np = 4 7958 0.091335 np = 5 6542 0.080261 np = 6 5463 0.067957 np = 7 4881 0.100654 np = 8 4292 0.085887 np = 9 4101 0.089979 np = 10 3714 0.083326 np = 11 3284 0.081005 np = 12 3012 0.076399 Now, that is more like it: runtimes monotonically decrease toward n=6, jump, then decrease (with some jitter) toward n=12. The revised Python script showing average runtimes for easier analysis & plotting is available on https://gist.github.com/tkphd/7f62afc064448ca80025. Trevor Keller, Ph. D. National Institute of Standards and Technology 100 Bureau Dr., MS 8550; Gaithersburg, MD 20899 Office: 223/A131 or (301) 975-2889 ________________________________ From: [email protected] <[email protected]> on behalf of Warren, James A. <[email protected]> Sent: Thursday, July 30, 2015 10:59 AM To: FIPY Subject: Re: Notes on FiPy+Trilinos installation in Anaconda Trevor, this very valuable. Many thanks. Dr. James A Warren Director, Materials Genome Program Material Measurement Laboratory National Institute of Standards and Technology (301)-975-5708<tel:(301)-975-5708> ________________________________ On: 29 July 2015 12:52, "Keller, Trevor" <[email protected]> wrote: Hello fellow FiPyers, There are several bits and pieces mentioning FiPy and Trilinos here, in the user's manual, and around the Internet, but I haven't found a detailed guide to install FiPy with Trilinos from source. I recently gave it a try on Debian -- and it wasn't too bad. The whole process, which involved a couple false starts, took me a day and a half. It is my hope in publishing this bare-bones guide that fellow travelers can get it running on Linux in just a few hours. If you'd like to follow this guide, the standard disclaimer applies: you are downloading source code from the Internet and executing it on your machine. Proceed with the utmost caution. My machine runs Debian GNU/Linux 7.8 "wheezy" on a 4-core Intel processor. I am not a sudoer, so a virtual environment was used: conda, from the Anaconda Python distribution. The various source codes were extracted and compiled within the ~/Downloads directory, then installed into the conda environment. Many of the Python programs are available directly through Anaconda, and should probably be installed through that package manager rather than from source. I chose to compile everything to (a) see if I could and (b) ensure compiler and library consistency. The attached build notes indicate the order in which packages were installed, provide the home- or download-page URL for the software project, and specify the commands I used to compile and install the code. Installation of Anaconda is excluded, since I did not have to do it; commands to download, extract, and cd into the source code are excluded since the latest versions change almost daily. After installation, I executed Daniel Wheeler's test script (https://gist.github.com/wd15/8717979) to confirm everything worked. This resulted in a couple of interesting things. First, FiPy as-is compares versions as strings and believes gmsh 2.10.0 is less than 2.5.0. I revised my local copy to compare versions as '.'-delimited tuples, which allowed the code to execute in parallel; I'm working on a patch. Second, the runtimes increased through np=4, then dropped at np=5 and beyond to values similar to Dr. Wheeler's (http://wd15.github.io/2014/01/30/fipy-trilinos-anaconda/). The machine has only four cores and is not hyperthreaded, so this was a surprise. It could be that timing differences for heavier workloads will follow expected scaling laws, but I am baffled by the result. Results table (run type, MPI rank, grid size, time; modified script at https://gist.github.com/anonymous/43afb6c567ebfd6d3b2f): $ echo -n "serial"; python mpitest.py; echo -n "trilin"; python mpitest.py --trilinos; for n in {1..6}; do echo -n "np = ${n}"; mpirun -np $n python mpitest.py --trilinos; done serial 0 27900 0.134404 trilin 0 27900 0.275607 np = 1 0 27900 0.664498 np = 2 0 14973 2.783230 1 14973 2.797370 np = 3 0 10230 3.167559 1 10480 3.174004 2 10538 3.196746 np = 4 0 7958 5.139595 1 7971 5.139695 2 8106 5.139710 3 7955 5.139925 np = 5 0 6542 0.499731 1 6538 0.500179 2 6534 0.500313 3 6655 0.498517 4 6689 0.500652 np = 6 0 5463 0.447847 1 5584 0.448344 2 5594 0.445663 3 5512 0.449028 4 5570 0.446170 5 5571 0.449486 Thoughts on this anomalous behavior would be greatly appreciated! Good luck, Trevor
Recommended build order for dependencies and FiPy in a clean Anaconda environment with OpenMPI and Trilinos support Questions/comments to [email protected] (Trevor Keller) $ conda create -n fipy python=2.7 # fipy is an alias, choose any name you wish $ source activate fipy # use the alias you chose $ export ANACONDA=/abs/path/to/conda/envs/fipy # append this to your .bashrc Notes: make: -j4 issued due to 4-way parallel CPU, adjust for your hardware cmake: assumes mkdir build; cd build in every instance execute ccmake to interactively set flags configure: issue ./configure --help for options. Make sure DEBUG is disabled. Build shared libraries whenever possible. Trilinos has OpenMP support, which is thwarted by the Python GIL. For best performance, limit your environment to one thread per core before launching in parallel, e.g. $ export OMP_NUM_THREADS=1; mpirun -np 4 python mpitest.py Build order: cmake http://www.cmake.org/download/ ./bootstrap --prefix=$ANACONDA --parallel=4; gmake -j4 && gmake install export CMAKE_ROOT=$ANACONDA/share/cmake-3.3 Flex http://sourceforge.net/projects/flex/files/ ./configure --prefix=$ANACONDA; make -j4 && make install Bison http://www.gnu.org/software/bison/ ./configure --prefix=$ANACONDA; make -j4 && make install Pth http://www.gnu.org/software/pth/ ./configure --prefix=$ANACONDA; make && make install # make -j4 failed doxygen http://www.stack.nl/~dimitri/doxygen/download.html#srcbin cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release .. make -j4 && make install openmpi http://www.open-mpi.org/software/ompi/v1.8/ ./configure --prefix=$ANACONDA; make -j4 && make install bzip2 http://www.bzip.org/downloads.html make -f Makefile-libbz2_so # build shared library make PREFIX=$ANACONDA install libboost http://www.boost.org/users/download/ ./bootstrap.sh --prefix=$ANACONDA # Add "using mpi ;" to project-config.jam (I put it under "using gcc ;") ZLIB_LIBPATH=$ANACONDA/lib BZIP2_INCLUDE=$ANACONDA/include ./b2 --prefix=$ANACONDA --with-mpi= variant=release link=shared install glm http://glm.g-truc.net/0.9.4/ cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release .. make -j4 && make install cython http://cython.org/#download python setup.py install HDF5 https://www.hdfgroup.org/HDF5/release/cmakebuild.html # Edit HDF518config.cmake to enable shared libraries (search BUILD_SHARED_LIBS:BOOL=OFF) ./build-hdf518-unix.sh ./HDF5-1.8.15-patch1-Linux.sh # accept default path cd HDF5-1.8.15-patch1-Linux; for d in */; do cp -r $d $ANACONDA/; done OpenBLAS https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide make -j4 PREFIX=$ANACONDA FC=gfortran && make PREFIX=$ANACONDA install numpy http://sourceforge.net/projects/numpy/files/ python setup.py install scipy http://sourceforge.net/projects/scipy/files/ python setup.py install pysparse http://pysparse.sourceforge.net/ python setup.py install pyamg https://github.com/pyamg/pyamg python setup.py install mpi4py https://bitbucket.org/mpi4py python setup.py install netcdf https://www.unidata.ucar.edu/downloads/netcdf/index.jsp # Use the stable C distribution. HDF5 support requires you build in parallel. CC=mpicc ./configure --prefix=$ANACONDA --with-zlib=$ANACONDA; make check install && make install cp build/include/netcdf_par.h $ANACONDA/include/ # for some reason, make install neglects this file. Trilinos needs it. swig http://www.swig.org/download.html ./configure --prefix=$ANACONDA; make -j4 && make install fltk http://www.fltk.org/software.php cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..; make -j4 && make install mmg3d http://www.math.u-bordeaux1.fr/~dobrzyns/logiciels/download.php cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..; make -j4 && make install tetgen http://wias-berlin.de/software/tetgen/#Download cmake -DCMAKE_INSTALL_PREFIX=$ANACONDA -DCMAKE_BUILD_TYPE=Release ..; make -j4 && make install gmsh http://geuz.org/gmsh/ for d in */; do cp -r $d $ANACONDA/; done # binary distribution PETSc http://www.mcs.anl.gov/petsc/download/ # The configure script for PETSc will walk you through installation, # with flags tailored to your architecture. Follow its lead. ./configure --prefix=$ANACONDA --with-mpi=1 --with-debugging=0 \ COPTFLAGS='-O3 -march=native -mtune=native' \ CXXOPTFLAGS='-O3 -march=native -mtune=native' \ FOPTFLAGS='-O3 -march=native -mtune=native' make PETSC_DIR=. PETSC_ARCH=arch-linux2-c-opt all make PETSC_DIR=. PETSC_ARCH=arch-linux2-c-opt install petsc4py https://bitbucket.org/petsc/petsc4py # The variables PETSC_DIR and PETSC_ARCH should match your PETSc make command PETSC_DIR=$ANACONDA PETSC_ARCH=arch-linux2-c-opt python setup.py install trilinos https://trilinos.org/download/ or github CMAKE_PREFIX_PATH=$ANACONDA cmake -DCMAKE_BUILD_TYPE:STRING=RELEASE \ -DTrilinos_ENABLE_PyTrilinos:BOOL=ON -DTrilinos_ENABLE_OpenMP:BOOL=ON \ -DKokkos_ENABLE_OpenMP:BOOL=ON -DKokkos_ENABLE_Serial:BOOL=OFF \ -DTrilinos_ENABLE_STK:BOOL=OFF -DBUILD_SHARED_LIBS:BOOL=ON \ -DTPL_ENABLE_MPI:BOOL=ON -DMPI_BASE_DIR:PATH=$ANACONDA \ -DTPL_ENABLE_Pthread:BOOL=ON -DDART_TESTING_TIMEOUT:STRING=600 \ -DTrilinos_ENABLE_TESTS:BOOL=ON -DCMAKE_INSTALL_PREFIX:PATH=$ANACONDA \ -DTrilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON \ -DTrilinos_ASSERT_MISSING_PACKAGES=OFF \ -DPyTrilinos_INSTALL_PREFIX:PATH=$ANACONDA .. make -j4 && make install cd pytrilinos; python setup.py install LSMLIB http://ktchu.serendipityresearch.org/software/lsmlib/index.html ./configure --prefix=$ANACONDA; make -j4 && make install scikit-fmm https://github.com/scikit-fmm/scikit-fmm python setup.py install gist https://bitbucket.org/dpgrote/pygist python setup.py config && python setup.py install fipy https://github.com/usnistgov/fipy python setup.py install
_______________________________________________ fipy mailing list [email protected] http://www.ctcms.nist.gov/fipy [ NIST internal ONLY: https://email.nist.gov/mailman/listinfo/fipy ]
