On Tue, 31 Mar 2015 22:37:38 +0200 Johannes Ring <[email protected]> wrote:
> On Tue, Mar 31, 2015 at 8:44 PM, Anders Logg <[email protected]> > wrote: > > ok, I thought it was a compile-time option. > > > > I can't answer which option is best but if I interpret Jed > > correctly, we should build OpenBLAS single threaded. > > I have updated the script now to build with single threaded OpenBLAS. Johannes, wouldn't it be beneficial for all to push the change upstream to hashstack as a default. Jed, don't you build BLAS in PETSc configure by something like --with-openblas=1. If so, I suppose you build it without threads. Jan > > Johannes > > > -- > > Anders > > > > > > tis 31 mars 2015 kl 20:41 skrev Johannes Ring <[email protected]>: > >> > >> On Tue, Mar 31, 2015 at 8:35 PM, Anders Logg > >> <[email protected]> wrote: > >> > > >> > > >> > tis 31 mars 2015 kl 20:22 skrev Johannes Ring > >> > <[email protected]>: > >> >> > >> >> On Tue, Mar 31, 2015 at 7:03 PM, Jan Blechta > >> >> <[email protected]> > >> >> wrote: > >> >> > It is OpenBLAS. > >> >> > > >> >> > Citing from > >> >> > https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded > >> >> > > >> >> > If your application is already multi-threaded, it will > >> >> > conflict with > >> >> > OpenBLAS multi-threading. > >> >> > >> >> Good find! Setting OPENBLAS_NUM_THREADS=1 (or > >> >> OMP_NUM_THREADS=1) the Cahn-Hilliard demo takes 35s with > >> >> 1.6.0dev, while it took 1m7s without this environment variable. > >> >> > >> >> Johannes > >> > > >> > > >> > Great! Can you push this fix? Then I can try here as well. > >> > >> This is a runtime variable. We could add that variable to the > >> config file, or should we build OpenBLAS single threaded version > >> (using make USE_THREAD=0)? > >> > >> Johannes > >> > >> > -- > >> > Anders > >> > > >> > > >> >> > We must be cautious of this. (Are PETSc devs? CCing Jed > >> >> > Brown.) This should be taken into account in HashDist and > >> >> > possibly we could check for this during DOLFIN configure. It > >> >> > is also possible that it has something to do with issues #326 > >> >> > and #491. > >> >> > > >> >> > Jan > >> >> > > >> >> > > >> >> > On Tue, 31 Mar 2015 17:47:15 +0200 > >> >> > Jan Blechta <[email protected]> wrote: > >> >> > > >> >> >> We have found together with Jaroslav Hron here that HashDist > >> >> >> build (about 2 months old) of 1.5 spawns (processes/threads) > >> >> >> and steals all > >> >> >> the (hyper-threading) cores on the machine. Can you confirm > >> >> >> it, Anders? > >> >> >> > >> >> >> I'm not sure which piece of software is doing this. Any > >> >> >> guess here? I'd like to know which software is so cheeky. > >> >> >> > >> >> >> time OMP_NUM_THREADS=1 DOLFIN_NOPLOT=1 python > >> >> >> demo_cahn-hilliard.py > >> >> >> > >> >> >> produces much more satisfactory timings. > >> >> >> > >> >> >> Jan > >> >> >> > >> >> >> > >> >> >> On Tue, 31 Mar 2015 15:09:24 +0000 > >> >> >> Anders Logg <[email protected]> wrote: > >> >> >> > >> >> >> > Hmm... So what conclusions should one make? > >> >> >> > > >> >> >> > - Major difference lies in PETSc LU solver > >> >> >> > > >> >> >> > - 1.6dev looks faster than 1.5 for Johannes > >> >> >> > > >> >> >> > - assemble_cells twice as fast in Debian package than > >> >> >> > HashDist build > >> >> >> > >> >> >> > >> >> >> > >> >> >> > > >> >> >> > - Apply (PETScVector) happens a lot more than it used to > >> >> >> > > >> >> >> > - Init tensor, build sparsity, delete sparsity happens a > >> >> >> > lot less > >> >> >> > > >> >> >> > Important questions: > >> >> >> > > >> >> >> > - Are the Debian packages built with more optimization > >> >> >> > than the HashDist build uses? (indicated by faster > >> >> >> > assemble_cells for Debian > >> >> >> > version) > >> >> >> > > >> >> >> > - How can the PETSc LU solve timings change? Are different > >> >> >> > PETSc version being used, or is PETSc built differently? > >> >> >> > > >> >> >> > -- > >> >> >> > Anders > >> >> >> > > >> >> >> > > >> >> >> > tis 31 mars 2015 kl 10:25 skrev Johannes Ring > >> >> >> > <[email protected]>: > >> >> >> > > >> >> >> > > Here are my numbers (see attachment). > >> >> >> > > > >> >> >> > > Johannes > >> >> >> > > > >> >> >> > > On Tue, Mar 31, 2015 at 9:46 AM, Garth N. Wells > >> >> >> > > <[email protected]> > >> >> >> > > wrote: > >> >> >> > > > FEniCS 1.4 package (Ubuntu 14.10) > >> >> >> > > > > >> >> >> > > > Summary of > >> >> >> > > > timings | > >> >> >> > > > Average time Total time Reps > >> >> >> > > > ------------------------------------------------------------ > >> >> >> > > ------------------------------ > >> >> >> > > > Apply > >> >> >> > > > (PETScMatrix) | > >> >> >> > > > 0.00033009 0.079882 242 Apply > >> >> >> > > > (PETScVector) | > >> >> >> > > > 6.9951e-06 0.005806 830 Assemble > >> >> >> > > > cells | > >> >> >> > > > 0.017927 9.5731 534 Boost Cuthill-McKee graph > >> >> >> > > > ordering (from dolfin::Graph) | 9.5844e-05 > >> >> >> > > > 9.5844e-05 1 Build Boost CSR > >> >> >> > > > graph | 7.7009e-05 > >> >> >> > > > 7.7009e-05 1 Build mesh number mesh > >> >> >> > > > entities | 0 0 2 > >> >> >> > > > Build > >> >> >> > > > sparsity | > >> >> >> > > > 0.0041105 0.0082209 2 Delete > >> >> >> > > > sparsity | > >> >> >> > > > 1.0729e-06 2.1458e-06 2 Init > >> >> >> > > > MPI | > >> >> >> > > > 0.055825 0.055825 1 Init > >> >> >> > > > PETSc | > >> >> >> > > > 0.056171 0.056171 1 Init dof > >> >> >> > > > vector | > >> >> >> > > > 0.00018656 0.00037313 2 Init > >> >> >> > > > dofmap | > >> >> >> > > > 0.0064399 0.0064399 1 Init dofmap from UFC > >> >> >> > > > dofmap | 0.0017549 > >> >> >> > > > 0.0035098 2 Init > >> >> >> > > > tensor | > >> >> >> > > > 0.0002135 0.00042701 2 LU > >> >> >> > > > solver > >> >> >> > > > | 0.11543 27.933 242 PETSc LU > >> >> >> > > > solver | > >> >> >> > > > 0.1154 27.926 242 > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > FEniCS dev (my build, using PETSc dev) > >> >> >> > > > > >> >> >> > > > [MPI_AVG] Summary of timings | reps wall > >> >> >> > > > avg wall tot > >> >> >> > > > > >> >> >> > > > ---------------------------------------------------------------- > >> >> >> > > > Apply (PETScMatrix) | 242 0.00020009 > >> >> >> > > > 0.048421 > >> >> >> > > > Apply (PETScVector) | 830 8.5487e-06 > >> >> >> > > > 0.0070954 > >> >> >> > > > Assemble cells | 534 0.017001 > >> >> >> > > > 9.0787 > >> >> >> > > > Build mesh number mesh entities | 1 7.35e-07 > >> >> >> > > > 7.35e-07 > >> >> >> > > > Build sparsity | 2 0.0068867 > >> >> >> > > > 0.013773 > >> >> >> > > > Delete sparsity | 2 9.88e-07 > >> >> >> > > > 1.976e-06 > >> >> >> > > > Init MPI | 1 0.0023164 > >> >> >> > > > 0.0023164 > >> >> >> > > > Init PETSc | 1 0.002519 > >> >> >> > > > 0.002519 > >> >> >> > > > Init dof vector | 2 0.00016088 > >> >> >> > > > 0.00032177 > >> >> >> > > > Init dofmap | 1 0.04457 > >> >> >> > > > 0.04457 > >> >> >> > > > Init dofmap from UFC dofmap | 1 0.0035997 > >> >> >> > > > 0.0035997 > >> >> >> > > > Init tensor | 2 0.00034076 > >> >> >> > > > 0.00068153 > >> >> >> > > > LU solver | 242 0.097293 > >> >> >> > > > 23.545 > >> >> >> > > > PETSc LU solver | 242 0.097255 > >> >> >> > > > 23.536 > >> >> >> > > > SCOTCH graph ordering | 1 0.0005598 > >> >> >> > > > 0.0005598 > >> >> >> > > > compute connectivity 1 - 2 | 1 0.00088592 > >> >> >> > > > 0.00088592 > >> >> >> > > > compute entities dim = 1 | 1 0.028021 > >> >> >> > > > 0.028021 > >> >> >> > > > > >> >> >> > > > Garth > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > On Mon, Mar 30, 2015 at 11:37 PM, Jan Blechta > >> >> >> > > > <[email protected]> wrote: > >> >> >> > > >> Could you, guys, run it with > >> >> >> > > >> > >> >> >> > > >> list_timings() > >> >> >> > > >> > >> >> >> > > >> to get a detailed structure where's the time spent? > >> >> >> > > >> > >> >> >> > > >> Jan > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >> On Mon, 30 Mar 2015 23:21:41 +0200 > >> >> >> > > >> Johannes Ring <[email protected]> wrote: > >> >> >> > > >> > >> >> >> > > >>> On Mon, Mar 30, 2015 at 8:37 PM, Anders Logg > >> >> >> > > >>> <[email protected]> wrote: > >> >> >> > > >>> > Could you or someone else build FEniCS with > >> >> >> > > >>> > fenics-install.sh (takes time but is presumably > >> >> >> > > >>> > automatic) and compare? > >> >> >> > > >>> > >> >> >> > > >>> I got 53s with the Debian packages and 1m5s with the > >> >> >> > > >>> HashDist > >> >> >> > > >>> based installation. > >> >> >> > > >>> > >> >> >> > > >>> > The alternative would be for me to build FEniCS > >> >> >> > > >>> > manually but > >> >> >> > > >>> > that takes a lot of manual effort and it's not > >> >> >> > > >>> > clear I can make a "good" build. It would be good > >> >> >> > > >>> > to get a number, not only to check for a possible > >> >> >> > > >>> > regression but also to test whether something is > >> >> >> > > >>> > suboptimal in the HashDist build. > >> >> >> > > >>> > > >> >> >> > > >>> > Johannes, is the HashDist build with optimization? > >> >> >> > > >>> > >> >> >> > > >>> DOLFIN is built with CMAKE_BUILD_TYPE=Release. The > >> >> >> > > >>> flags for building PETSc is listed below. > >> >> >> > > >>> > >> >> >> > > >>> Johannes > >> >> >> > > >>> > >> >> >> > > >>> PETSc flags for Debian package: > >> >> >> > > >>> > >> >> >> > > >>> PETSC_DIR=/tmp/src/petsc-3.4.2.dfsg1 > >> >> >> > > >>> PETSC_ARCH=linux-gnu-c-opt \ ./config/configure.py > >> >> >> > > >>> --with-shared-libraries --with-debugging=0 \ > >> >> >> > > >>> --useThreads 0 --with-clanguage=C++ --with-c-support > >> >> >> > > >>> \ --with-fortran-interfaces=1 \ > >> >> >> > > >>> --with-mpi-dir=/usr/lib/openmpi > >> >> >> > > >>> --with-mpi-shared=1 \ --with-blas-lib=-lblas > >> >> >> > > >>> --with-lapack-lib=-llapack \ --with-blacs=1 > >> >> >> > > >>> --with-blacs-include=/usr/include \ > >> >> >> > > >>> --with-blacs-lib=[/usr/lib/libblacsCinit-openmpi.so,/usr/ > >> >> >> > > lib/libblacs-openmpi.so] > >> >> >> > > >>> \ > >> >> >> > > >>> --with-scalapack=1 > >> >> >> > > >>> --with-scalapack-include=/usr/include \ > >> >> >> > > >>> --with-scalapack-lib=/usr/lib/libscalapack-openmpi.so > >> >> >> > > >>> \ --with-mumps=1 --with-mumps-include=/usr/include \ > >> >> >> > > >>> --with-mumps-lib=[/usr/lib/libdmumps.so,/usr/lib/ > >> >> >> > > > >> >> >> > > libzmumps.so,/usr/lib/libsmumps.so,/usr/lib/libcmumps.so,/usr/lib/ > >> >> >> > > libmumps_common.so,/usr/lib/libpord.so] > >> >> >> > > >>> \ > >> >> >> > > >>> --with-umfpack=1 > >> >> >> > > >>> --with-umfpack-include=/usr/include/suitesparse \ > >> >> >> > > >>> > >> >> >> > > >>> --with-umfpack-lib=[/usr/lib/libumfpack.so,/usr/lib/libamd.so] > >> >> >> > > >>> \ --with-cholmod=1 > >> >> >> > > >>> --with-cholmod-include=/usr/include/suitesparse \ > >> >> >> > > >>> --with-cholmod-lib=/usr/lib/libcholmod.so \ > >> >> >> > > >>> --with-spooles=1 > >> >> >> > > >>> --with-spooles-include=/usr/include/spooles \ > >> >> >> > > >>> --with-spooles-lib=/usr/lib/libspooles.so \ > >> >> >> > > >>> --with-hypre=1 --with-hypre-dir=/usr \ > >> >> >> > > >>> --with-ptscotch=1 > >> >> >> > > >>> --with-ptscotch-include=/usr/include/scotch \ > >> >> >> > > >>> --with-ptscotch-lib=[/usr/lib/libptesmumps.so,/usr/lib/ > >> >> >> > > libptscotch.so,/usr/lib/libptscotcherr.so] > >> >> >> > > >>> \ > >> >> >> > > >>> --with-fftw=1 --with-fftw-include=/usr/include \ > >> >> >> > > >>> > >> >> >> > > >>> --with-fftw-lib=[/usr/lib/x86_64-linux-gnu/libfftw3.so,/usr/ > >> >> >> > > lib/x86_64-linux-gnu/libfftw3_mpi.so] > >> >> >> > > >>> \ > >> >> >> > > >>> --with-hdf5=1 > >> >> >> > > >>> --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi > >> >> >> > > >>> --CXX_LINKER_FLAGS="-Wl,--no-as-needed" > >> >> >> > > >>> > >> >> >> > > >>> > >> >> >> > > >>> PETSc flags for HashDist based build: > >> >> >> > > >>> > >> >> >> > > >>> mkdir ${PWD}/_tmp && TMPDIR=${PWD}/_tmp \ > >> >> >> > > >>> ./configure --prefix="${ARTIFACT}" \ > >> >> >> > > >>> COPTFLAGS=-O2 \ > >> >> >> > > >>> --with-shared-libraries=1 \ > >> >> >> > > >>> --with-debugging=0 \ > >> >> >> > > >>> --with-ssl=0 \ > >> >> >> > > >>> --with-blas-lapack-lib=${OPENBLAS_DIR}/lib/libopenblas.so > >> >> >> > > >>> \ > >> >> >> > > >>> --with-metis-dir=$PARMETIS_DIR \ > >> >> >> > > >>> --with-parmetis-dir=$PARMETIS_DIR \ > >> >> >> > > >>> --with-scotch-dir=${SCOTCH_DIR} \ > >> >> >> > > >>> --with-ptscotch-dir=${SCOTCH_DIR} \ > >> >> >> > > >>> --with-suitesparse=1 \ > >> >> >> > > >>> > >> >> >> > > >>> > >> >> >> > > >>> --with-suitesparse-include=${SUITESPARSE_DIR}/include/suitesparse > >> >> >> > > >>> \ --with-suitesparse-lib=[${SUITESPARSE_DIR}/lib/ > >> >> >> > > > >> >> >> > > > >> >> >> > > libumfpack.a,libklu.a,libcholmod.a,libbtf.a,libccolamd.a,libcolamd.a, > >> >> >> > > libcamd.a,libamd.a,libsuitesparseconfig.a] > >> >> >> > > >>> \ > >> >> >> > > >>> --with-hypre=1 \ > >> >> >> > > >>> --with-hypre-include=${HYPRE_DIR}/include \ > >> >> >> > > >>> --with-hypre-lib=${HYPRE_DIR}/lib/libHYPRE.so \ > >> >> >> > > >>> --with-mpi-compilers \ > >> >> >> > > >>> CC=$MPICC \ > >> >> >> > > >>> CXX=$MPICXX \ > >> >> >> > > >>> F77=$MPIF77 \ > >> >> >> > > >>> F90=$MPIF90 \ > >> >> >> > > >>> FC=$MPIF90 \ > >> >> >> > > >>> --with-patchelf-dir=$PATCHELF_DIR \ > >> >> >> > > >>> --with-python-dir=$PYTHON_DIR \ > >> >> >> > > >>> --with-superlu_dist-dir=$SUPERLU_DIST_DIR \ > >> >> >> > > >>> --download-mumps=1 \ > >> >> >> > > >>> --download-scalapack=1 \ > >> >> >> > > >>> --download-blacs=1 \ > >> >> >> > > >>> --download-ml=1 > >> >> >> > > >>> > >> >> >> > > >>> > >> >> >> > > >>> > -- > >> >> >> > > >>> > Anders > >> >> >> > > >>> > > >> >> >> > > >>> > > >> >> >> > > >>> > mån 30 mars 2015 kl 17:05 skrev Garth N. Wells > >> >> >> > > >>> > <[email protected]>: > >> >> >> > > >>> >> > >> >> >> > > >>> >> On Mon, Mar 30, 2015 at 1:34 PM, Anders Logg > >> >> >> > > >>> >> <[email protected]> wrote: > >> >> >> > > >>> >> > See this question on the QA forum: > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > http://fenicsproject.org/qa/6875/ubuntu-compile-from- > >> >> >> > > source-which-provide-better-performance > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > The Cahn-Hilliard demo takes 40 seconds with > >> >> >> > > >>> >> > 1.3 Ubuntu packages and 52 seconds with 1.5+ > >> >> >> > > >>> >> > built from source. Are > >> >> >> > > >>> >> > these regressions in performance or > >> >> >> > > >>> >> > is Johannes that much better at building Debian > >> >> >> > > >>> >> > packages > >> >> >> > > >>> >> > than I am building > >> >> >> > > >>> >> > FEniCS (with HashDist). > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > >> >> >> > > >>> >> With the 1.4 Ubuntu package (Ubuntu 14.10), I get > >> >> >> > > >>> >> 42s. With > >> >> >> > > >>> >> my build of the dev version (I don't use > >> >> >> > > >>> >> Hashdist) I get 34s. > >> >> >> > > >>> >> > >> >> >> > > >>> >> Garth > >> >> >> > > >>> >> > >> >> >> > > >>> >> > PS: Looking at the benchbot, there seem to have > >> >> >> > > >>> >> > been some > >> >> >> > > >>> >> > regressions in the > >> >> >> > > >>> >> > timing facilities with the recent changes: > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > http://fenicsproject.org/benchbot/ > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > -- > >> >> >> > > >>> >> > Anders > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > _______________________________________________ > >> >> >> > > >>> >> > fenics mailing list > >> >> >> > > >>> >> > [email protected] > >> >> >> > > >>> >> > http://fenicsproject.org/mailman/listinfo/fenics > >> >> >> > > >>> >> > > >> >> >> > > >>> > > >> >> >> > > >>> > > >> >> >> > > >>> > _______________________________________________ > >> >> >> > > >>> > fenics mailing list > >> >> >> > > >>> > [email protected] > >> >> >> > > >>> > http://fenicsproject.org/mailman/listinfo/fenics > >> >> >> > > >>> > > >> >> >> > > >>> _______________________________________________ > >> >> >> > > >>> fenics mailing list > >> >> >> > > >>> [email protected] > >> >> >> > > >>> http://fenicsproject.org/mailman/listinfo/fenics > >> >> >> > > >> > >> >> >> > > >> _______________________________________________ > >> >> >> > > >> fenics mailing list > >> >> >> > > >> [email protected] > >> >> >> > > >> http://fenicsproject.org/mailman/listinfo/fenics > >> >> >> > > > >> >> >> > >> >> >> _______________________________________________ > >> >> >> fenics mailing list > >> >> >> [email protected] > >> >> >> http://fenicsproject.org/mailman/listinfo/fenics > >> >> > > >> >> > _______________________________________________ > >> >> > fenics mailing list > >> >> > [email protected] > >> >> > http://fenicsproject.org/mailman/listinfo/fenics > >> _______________________________________________ > >> fenics mailing list > >> [email protected] > >> http://fenicsproject.org/mailman/listinfo/fenics > _______________________________________________ > fenics mailing list > [email protected] > http://fenicsproject.org/mailman/listinfo/fenics _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
