Re: [deal.II] New deal.II 8.5.1 is 20% slower than deal.II 8.0.0

Martin Kronbichler Wed, 27 Dec 2017 00:50:24 -0800

In general, we strive to make deal.II faster with new releases, and formany cases that is also true as I can confirm from my applications. Ihave ran step-23 on release 8.0 as well as the current developmentsources and I can confirm that the new version is slower on my machine.If I disable output of step-23, I get a run time of 4.7 seconds forversion 8.0 and 5.3 seconds for the current version. After someinvestigations I found out that while some solver-related operations gotfaster indeed (the problem with 16k dofs is small enough to run from L3cache in my case), we are slower in the FEValues::reinit() calls. Thiscall appears in VectorTools::create_right_hand_side() and theVectorTools::interpolate_boundary_values in the time loop. The reasonfor this is that we nowadays call"MappingQGeneric::compute_mapping_support_points" also for the bilinearmapping MappingQ1, which allocates and de-allocates a vector. While thisis uncritical on higher order mappings, in 2D with linear shapefunctions the time spent there is indeed not negligible. This is indeedunfortunate for your use case, but I want to stress that the changeswere made in the hope to make that part of the code more reliable.Furthermore, those parts of the code are not performance critical andnot accurately tracked. It is a rather isolated issue that got worsehere, so from this single example one definitely not say that we aregoing the wrong direction as a project.

While there are plenty of things I could imagine to make this particularcase more efficient in the application code, way beyond the performanceof what the version 8.0 provided - note that I would not write the codelike that if it were performance critical - the only obvious thing isthat we could try to work around the memory allocations by not returninga vector in MappingQGeneric::compute_mapping_support_points but ratherfill an existing array inMappingQGeneric::InternalData::mapping_support_points. Nobody of usdevelopers has this high on the priority list right now, but we woulddefinitely appreciate if some of our users, like you, wants to look intothat. I could guide you to the right spots.


Best regards,
Martin


On 26.12.2017 21:22, [email protected] wrote:

Yes, the two are attached. The key lines from their diff result:

$ diff detailed.log-v8.1.0 detailed.log-v8.5.1
...
< #  Compiler flags used for this build:

< # CMAKE_CXX_FLAGS: -pedantic -fpic -Wall-Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch-Wno-unused-local-typedefs -Wno-long-long -Wno-deprecated-Wno-deprecated-declarations -std=c++11 -Wno-parentheses -Wno-long-long< # DEAL_II_CXX_FLAGS_RELEASE: -O2 -funroll-loops-funroll-all-loops -fstrict-aliasing -Wno-unused

---
> #  Base configuration (prior to feature configuration):

> # DEAL_II_CXX_FLAGS: -pedantic -fPIC -Wall -Wextra-Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch-Woverloaded-virtual -Wno-long-long -Wno-deprecated-declarations-Wno-literal-suffix -std=c++11> # DEAL_II_CXX_FLAGS_RELEASE: -O2 -funroll-loops-funroll-all-loops -fstrict-aliasing -Wno-unused-local-typedefs

18c19

< # DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic-pthread

---

> # DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic-fuse-ld=gold

...
> #            BOOST_CXX_FLAGS = -Wno-unused-local-typedefs
...
> #      ( DEAL_II_WITH_BZIP2 = OFF )
> #        DEAL_II_WITH_CXX11 = ON
> #      ( DEAL_II_WITH_CXX14 = OFF )
> #      ( DEAL_II_WITH_GSL = OFF )
...
> #            THREADS_CXX_FLAGS = -Wno-parentheses
> #            THREADS_LINKER_FLAGS = -pthread


On Tuesday, December 26, 2017 at 10:10:44 PM UTC+3, Matthias Maier wrote:

    Would you mind sending us the "detailed.log" files?

    Best,
    Matthias


    On Tue, Dec 26, 2017, at 12:35 CST, [email protected]
    <javascript:> wrote:

    > Thanks. This is strange as I still get 15-20% consistently
    better results
    > in favor of older versions
    >  on three different machines already. Two more studies on other
    systems
    > attached below.
    >
    > TEST: Step-23 (integration time modified from 5 to 150, output
    suppressed)
    > CMAKE_BUILD_TYPE: "Release".
    >
    > MACHINE 1: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
    > $ cat /etc/redhat-release
    > CentOS Linux release 7.4.1708 (Core)
    > $ gcc --version
    > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
    >
    > deal v8.1.0 (built and installed from source):
    >
    > $ time ./step-23
    > real    1m23.768s
    > user    5m46.080s
    > sys    0m4.079s
    >
    > deal v8.5.1 (built and installed from source):
    >
    > $ time ./step-23
    > real    1m42.416s
    > user    5m37.018s
    > sys    0m4.340s
    >
    > MACHINE 2: Intel(R) Xeon(R) CPU X5690  @ 3.47GHz
    > $ lsb_release -a
    > Description:    Ubuntu 14.04.5 LTS
    > $ gcc --version
    > gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
    >
    > deal v8.1.0 (built and installed from source):
    >
    > $ time ./step-23
    > real    2m49.114s
    > user    11m41.429s
    > sys    0m48.882s
    >
    > deal v8.5.1 (built and installed from source):
    > $ time ./step-23
    > real    3m20.583s
    > user    10m54.850s
    > sys    2m18.989s
    >
    >
    > On Tuesday, December 26, 2017 at 6:08:17 PM UTC+3, Matthias
    Maier wrote:
    >>
    >> Hi,
    >>
    >> I get relatively comparable results for both versions:
    >>
    >> dev: ./step-23  55.55s user 1.64s system 131% cpu 43.637 total
    >> 8.0: ./step-23  55.85s user 1.48s system 129% cpu 44.130 total
    >>
    >> Is this the unmodified step-23 tutorial program?
    >>
    >> For measuring performance regressions a total runtime of less
    than 5
    >> seconds doesn't say that much. Never versions allocate and
    precompute
    >> quite a bunch of stuff upfront which might result in a small
    (problem
    >> independent) fixed runtime overhead (of a second or less).
    >>
    >> Best,
    >> Matthias
    >>
    >>
    >>
    >> On Tue, Dec 26, 2017, at 05:52 CST, [email protected]
    <javascript:>
    >> wrote:
    >>
    >> > Deal.II 8.5.1 seems to be 20% slower than 8.0.0. This is the
    timing I
    >> get
    >> > when running the Step-23 tutorial (output to screen and vtk is
    >> suppressed):
    >> >
    >> > deal.II version 8.0.0:
    >> >
    >> > $ time ./step-23
    >> > Number of active cells: 16384
    >> > Number of degrees of freedom: 16641
    >> >
    >> > real    0m3.432s
    >> > user    0m6.320s
    >> > sys    0m0.612s
    >> >
    >> > deal.II version 8.5.1:
    >> >
    >> > $ time ./step-23
    >> > Number of active cells: 16384
    >> > Number of degrees of freedom: 16641
    >> >
    >> > real    0m4.430s
    >> > user    0m7.080s
    >> > sys    0m0.492s
    >> >
    >> > In general, I get about 20% slow down for my own code when
    upgrading
    >> from
    >> > 8.0.0 to 8.5.1. What is the reason of such a slow down? Does
    the deal.II
    >> > follow the right direction given new versions become
    gradually slower?!
    >>

--
The deal.II project is located at http://www.dealii.org/

For mailing list/forum options, seehttps://groups.google.com/d/forum/dealii?hl=en

---

You received this message because you are subscribed to the GoogleGroups "deal.II User Group" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.

For more options, visit https://groups.google.com/d/optout.


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en

---You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [deal.II] New deal.II 8.5.1 is 20% slower than deal.II 8.0.0

Reply via email to