It is OpenBLAS. Citing from https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. We must be cautious of this. (Are PETSc devs? CCing Jed Brown.) This should be taken into account in HashDist and possibly we could check for this during DOLFIN configure. It is also possible that it has something to do with issues #326 and #491. Jan On Tue, 31 Mar 2015 17:47:15 +0200 Jan Blechta <[email protected]> wrote: > We have found together with Jaroslav Hron here that HashDist build > (about 2 months old) of 1.5 spawns (processes/threads) and steals all > the (hyper-threading) cores on the machine. Can you confirm it, > Anders? > > I'm not sure which piece of software is doing this. Any guess here? > I'd like to know which software is so cheeky. > > time OMP_NUM_THREADS=1 DOLFIN_NOPLOT=1 python > demo_cahn-hilliard.py > > produces much more satisfactory timings. > > Jan > > > On Tue, 31 Mar 2015 15:09:24 +0000 > Anders Logg <[email protected]> wrote: > > > Hmm... So what conclusions should one make? > > > > - Major difference lies in PETSc LU solver > > > > - 1.6dev looks faster than 1.5 for Johannes > > > > - assemble_cells twice as fast in Debian package than HashDist build > > > > > > > - Apply (PETScVector) happens a lot more than it used to > > > > - Init tensor, build sparsity, delete sparsity happens a lot less > > > > Important questions: > > > > - Are the Debian packages built with more optimization than the > > HashDist build uses? (indicated by faster assemble_cells for Debian > > version) > > > > - How can the PETSc LU solve timings change? Are different PETSc > > version being used, or is PETSc built differently? > > > > -- > > Anders > > > > > > tis 31 mars 2015 kl 10:25 skrev Johannes Ring <[email protected]>: > > > > > Here are my numbers (see attachment). > > > > > > Johannes > > > > > > On Tue, Mar 31, 2015 at 9:46 AM, Garth N. Wells <[email protected]> > > > wrote: > > > > FEniCS 1.4 package (Ubuntu 14.10) > > > > > > > > Summary of timings | > > > > Average time Total time Reps > > > > ------------------------------------------------------------ > > > ------------------------------ > > > > Apply (PETScMatrix) | > > > > 0.00033009 0.079882 242 > > > > Apply (PETScVector) | > > > > 6.9951e-06 0.005806 830 > > > > Assemble cells | > > > > 0.017927 9.5731 534 > > > > Boost Cuthill-McKee graph ordering (from dolfin::Graph) | > > > > 9.5844e-05 9.5844e-05 1 > > > > Build Boost CSR graph | > > > > 7.7009e-05 7.7009e-05 1 > > > > Build mesh number mesh entities | > > > > 0 0 2 > > > > Build sparsity | > > > > 0.0041105 0.0082209 2 > > > > Delete sparsity | > > > > 1.0729e-06 2.1458e-06 2 > > > > Init MPI | > > > > 0.055825 0.055825 1 > > > > Init PETSc | > > > > 0.056171 0.056171 1 > > > > Init dof vector | > > > > 0.00018656 0.00037313 2 > > > > Init dofmap | > > > > 0.0064399 0.0064399 1 > > > > Init dofmap from UFC dofmap | > > > > 0.0017549 0.0035098 2 > > > > Init tensor | > > > > 0.0002135 0.00042701 2 > > > > LU solver | > > > > 0.11543 27.933 242 > > > > PETSc LU solver | > > > > 0.1154 27.926 242 > > > > > > > > > > > > > > > > FEniCS dev (my build, using PETSc dev) > > > > > > > > [MPI_AVG] Summary of timings | reps wall avg wall tot > > > > ---------------------------------------------------------------- > > > > Apply (PETScMatrix) | 242 0.00020009 0.048421 > > > > Apply (PETScVector) | 830 8.5487e-06 0.0070954 > > > > Assemble cells | 534 0.017001 9.0787 > > > > Build mesh number mesh entities | 1 7.35e-07 7.35e-07 > > > > Build sparsity | 2 0.0068867 0.013773 > > > > Delete sparsity | 2 9.88e-07 1.976e-06 > > > > Init MPI | 1 0.0023164 0.0023164 > > > > Init PETSc | 1 0.002519 0.002519 > > > > Init dof vector | 2 0.00016088 0.00032177 > > > > Init dofmap | 1 0.04457 0.04457 > > > > Init dofmap from UFC dofmap | 1 0.0035997 0.0035997 > > > > Init tensor | 2 0.00034076 0.00068153 > > > > LU solver | 242 0.097293 23.545 > > > > PETSc LU solver | 242 0.097255 23.536 > > > > SCOTCH graph ordering | 1 0.0005598 0.0005598 > > > > compute connectivity 1 - 2 | 1 0.00088592 0.00088592 > > > > compute entities dim = 1 | 1 0.028021 0.028021 > > > > > > > > Garth > > > > > > > > > > > > On Mon, Mar 30, 2015 at 11:37 PM, Jan Blechta > > > > <[email protected]> wrote: > > > >> Could you, guys, run it with > > > >> > > > >> list_timings() > > > >> > > > >> to get a detailed structure where's the time spent? > > > >> > > > >> Jan > > > >> > > > >> > > > >> On Mon, 30 Mar 2015 23:21:41 +0200 > > > >> Johannes Ring <[email protected]> wrote: > > > >> > > > >>> On Mon, Mar 30, 2015 at 8:37 PM, Anders Logg > > > >>> <[email protected]> wrote: > > > >>> > Could you or someone else build FEniCS with > > > >>> > fenics-install.sh (takes time but is presumably automatic) > > > >>> > and compare? > > > >>> > > > >>> I got 53s with the Debian packages and 1m5s with the HashDist > > > >>> based installation. > > > >>> > > > >>> > The alternative would be for me to build FEniCS manually but > > > >>> > that takes a lot of manual effort and it's not clear I can > > > >>> > make a "good" build. It would be good to get a number, not > > > >>> > only to check for a possible regression but also to test > > > >>> > whether something is suboptimal in the HashDist build. > > > >>> > > > > >>> > Johannes, is the HashDist build with optimization? > > > >>> > > > >>> DOLFIN is built with CMAKE_BUILD_TYPE=Release. The flags for > > > >>> building PETSc is listed below. > > > >>> > > > >>> Johannes > > > >>> > > > >>> PETSc flags for Debian package: > > > >>> > > > >>> PETSC_DIR=/tmp/src/petsc-3.4.2.dfsg1 > > > >>> PETSC_ARCH=linux-gnu-c-opt \ ./config/configure.py > > > >>> --with-shared-libraries --with-debugging=0 \ --useThreads 0 > > > >>> --with-clanguage=C++ --with-c-support \ > > > >>> --with-fortran-interfaces=1 \ --with-mpi-dir=/usr/lib/openmpi > > > >>> --with-mpi-shared=1 \ --with-blas-lib=-lblas > > > >>> --with-lapack-lib=-llapack \ --with-blacs=1 > > > >>> --with-blacs-include=/usr/include \ > > > >>> --with-blacs-lib=[/usr/lib/libblacsCinit-openmpi.so,/usr/ > > > lib/libblacs-openmpi.so] > > > >>> \ > > > >>> --with-scalapack=1 --with-scalapack-include=/usr/include \ > > > >>> --with-scalapack-lib=/usr/lib/libscalapack-openmpi.so \ > > > >>> --with-mumps=1 --with-mumps-include=/usr/include \ > > > >>> --with-mumps-lib=[/usr/lib/libdmumps.so,/usr/lib/ > > > libzmumps.so,/usr/lib/libsmumps.so,/usr/lib/libcmumps.so,/usr/lib/ > > > libmumps_common.so,/usr/lib/libpord.so] > > > >>> \ > > > >>> --with-umfpack=1 > > > >>> --with-umfpack-include=/usr/include/suitesparse \ > > > >>> --with-umfpack-lib=[/usr/lib/libumfpack.so,/usr/lib/libamd.so] > > > >>> \ --with-cholmod=1 > > > >>> --with-cholmod-include=/usr/include/suitesparse \ > > > >>> --with-cholmod-lib=/usr/lib/libcholmod.so \ --with-spooles=1 > > > >>> --with-spooles-include=/usr/include/spooles \ > > > >>> --with-spooles-lib=/usr/lib/libspooles.so \ --with-hypre=1 > > > >>> --with-hypre-dir=/usr \ --with-ptscotch=1 > > > >>> --with-ptscotch-include=/usr/include/scotch \ > > > >>> --with-ptscotch-lib=[/usr/lib/libptesmumps.so,/usr/lib/ > > > libptscotch.so,/usr/lib/libptscotcherr.so] > > > >>> \ > > > >>> --with-fftw=1 --with-fftw-include=/usr/include \ > > > >>> --with-fftw-lib=[/usr/lib/x86_64-linux-gnu/libfftw3.so,/usr/ > > > lib/x86_64-linux-gnu/libfftw3_mpi.so] > > > >>> \ > > > >>> --with-hdf5=1 > > > >>> --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi > > > >>> --CXX_LINKER_FLAGS="-Wl,--no-as-needed" > > > >>> > > > >>> > > > >>> PETSc flags for HashDist based build: > > > >>> > > > >>> mkdir ${PWD}/_tmp && TMPDIR=${PWD}/_tmp \ > > > >>> ./configure --prefix="${ARTIFACT}" \ > > > >>> COPTFLAGS=-O2 \ > > > >>> --with-shared-libraries=1 \ > > > >>> --with-debugging=0 \ > > > >>> --with-ssl=0 \ > > > >>> --with-blas-lapack-lib=${OPENBLAS_DIR}/lib/libopenblas.so \ > > > >>> --with-metis-dir=$PARMETIS_DIR \ > > > >>> --with-parmetis-dir=$PARMETIS_DIR \ > > > >>> --with-scotch-dir=${SCOTCH_DIR} \ > > > >>> --with-ptscotch-dir=${SCOTCH_DIR} \ > > > >>> --with-suitesparse=1 \ > > > >>> --with-suitesparse-include=${SUITESPARSE_DIR}/include/suitesparse > > > >>> \ --with-suitesparse-lib=[${SUITESPARSE_DIR}/lib/ > > > libumfpack.a,libklu.a,libcholmod.a,libbtf.a,libccolamd.a,libcolamd.a, > > > libcamd.a,libamd.a,libsuitesparseconfig.a] > > > >>> \ > > > >>> --with-hypre=1 \ > > > >>> --with-hypre-include=${HYPRE_DIR}/include \ > > > >>> --with-hypre-lib=${HYPRE_DIR}/lib/libHYPRE.so \ > > > >>> --with-mpi-compilers \ > > > >>> CC=$MPICC \ > > > >>> CXX=$MPICXX \ > > > >>> F77=$MPIF77 \ > > > >>> F90=$MPIF90 \ > > > >>> FC=$MPIF90 \ > > > >>> --with-patchelf-dir=$PATCHELF_DIR \ > > > >>> --with-python-dir=$PYTHON_DIR \ > > > >>> --with-superlu_dist-dir=$SUPERLU_DIST_DIR \ > > > >>> --download-mumps=1 \ > > > >>> --download-scalapack=1 \ > > > >>> --download-blacs=1 \ > > > >>> --download-ml=1 > > > >>> > > > >>> > > > >>> > -- > > > >>> > Anders > > > >>> > > > > >>> > > > > >>> > mån 30 mars 2015 kl 17:05 skrev Garth N. Wells > > > >>> > <[email protected]>: > > > >>> >> > > > >>> >> On Mon, Mar 30, 2015 at 1:34 PM, Anders Logg > > > >>> >> <[email protected]> wrote: > > > >>> >> > See this question on the QA forum: > > > >>> >> > > > > >>> >> > > > > >>> >> > http://fenicsproject.org/qa/6875/ubuntu-compile-from- > > > source-which-provide-better-performance > > > >>> >> > > > > >>> >> > The Cahn-Hilliard demo takes 40 seconds with 1.3 Ubuntu > > > >>> >> > packages and 52 seconds with 1.5+ built from source. Are > > > >>> >> > these regressions in performance or > > > >>> >> > is Johannes that much better at building Debian packages > > > >>> >> > than I am building > > > >>> >> > FEniCS (with HashDist). > > > >>> >> > > > > >>> >> > > > >>> >> With the 1.4 Ubuntu package (Ubuntu 14.10), I get 42s. With > > > >>> >> my build of the dev version (I don't use Hashdist) I get > > > >>> >> 34s. > > > >>> >> > > > >>> >> Garth > > > >>> >> > > > >>> >> > PS: Looking at the benchbot, there seem to have been some > > > >>> >> > regressions in the > > > >>> >> > timing facilities with the recent changes: > > > >>> >> > > > > >>> >> > http://fenicsproject.org/benchbot/ > > > >>> >> > > > > >>> >> > -- > > > >>> >> > Anders > > > >>> >> > > > > >>> >> > > > > >>> >> > _______________________________________________ > > > >>> >> > fenics mailing list > > > >>> >> > [email protected] > > > >>> >> > http://fenicsproject.org/mailman/listinfo/fenics > > > >>> >> > > > > >>> > > > > >>> > > > > >>> > _______________________________________________ > > > >>> > fenics mailing list > > > >>> > [email protected] > > > >>> > http://fenicsproject.org/mailman/listinfo/fenics > > > >>> > > > > >>> _______________________________________________ > > > >>> fenics mailing list > > > >>> [email protected] > > > >>> http://fenicsproject.org/mailman/listinfo/fenics > > > >> > > > >> _______________________________________________ > > > >> fenics mailing list > > > >> [email protected] > > > >> http://fenicsproject.org/mailman/listinfo/fenics > > > > > _______________________________________________ > fenics mailing list > [email protected] > http://fenicsproject.org/mailman/listinfo/fenics _______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
