Dear Dmitry,

Thanks for the info. This sounds interesting. May I ask some more details: Are you using Q1 elements in your 2D model (this is the element used in step-23)? I have not looked into your history model, but I guess it can be adapted. Since your code is limited by FEValues::get_function_values, I suggest you take a look at the step-48 tutorial program. That code does indeed have performance-tuned components. It requires a somewhat different approach to write the loops, but it should pay off for time-dependent cases, I expect a speedup of a factor 4-6 in 2D for linear elements for the part corresponding to FEValues::get_function_values and FEValues::reinit. I would be happy to guide you more closely if this sounds applicable.

Best,
Martin


On 27.12.2017 14:16, [email protected] wrote:
Thank you!

Some guidance how I could optimize the code would be appreciated. I am using deal.II for solving a time-dependent nonlinear 2D problem (sort of sine-Gordon, but a more advanced model which includes a history dependence, https://github.com/drgulevich/mitmojco). Most of the time the deal.II code spends in:

1. fe_values.get_function_values -- most of the wall time (70%)
2. fe_values.reinit -- less often
3. CG solver -- even less often

Kind regards,
Dmitry

On Wednesday, December 27, 2017 at 11:49:42 AM UTC+3, Martin Kronbichler wrote:

    In general, we strive to make deal.II faster with new releases,
    and for many cases that is also true as I can confirm from my
    applications. I have ran step-23 on release 8.0 as well as the
    current development sources and I can confirm that the new version
    is slower on my machine. If I disable output of step-23, I get a
    run time of 4.7 seconds for version 8.0 and 5.3 seconds for the
    current version. After some investigations I found out that while
    some solver-related operations got faster indeed (the problem with
    16k dofs is small enough to run from L3 cache in my case), we are
    slower in the FEValues::reinit() calls. This call appears in
    VectorTools::create_right_hand_side() and the
    VectorTools::interpolate_boundary_values in the time loop. The
    reason for this is that we nowadays call
    "MappingQGeneric::compute_mapping_support_points" also for the
    bilinear mapping MappingQ1, which allocates and de-allocates a
    vector. While this is uncritical on higher order mappings, in 2D
    with linear shape functions the time spent there is indeed not
    negligible. This is indeed unfortunate for your use case, but I
    want to stress that the changes were made in the hope to make that
    part of the code more reliable. Furthermore, those parts of the
    code are not performance critical and not accurately tracked. It
    is a rather isolated issue that got worse here, so from this
    single example one definitely not say that we are going the wrong
    direction as a project.

    While there are plenty of things I could imagine to make this
    particular case more efficient in the application code, way beyond
    the performance of what the version 8.0 provided - note that I
    would not write the code like that if it were performance critical
    - the only obvious thing is that we could try to work around the
    memory allocations by not returning a vector in
    MappingQGeneric::compute_mapping_support_points but rather fill an
    existing array in
    MappingQGeneric::InternalData::mapping_support_points. Nobody of
    us developers has this high on the priority list right now, but we
    would definitely appreciate if some of our users, like you, wants
    to look into that. I could guide you to the right spots.

    Best regards,
    Martin


    On 26.12.2017 21:22, [email protected] <javascript:> wrote:
    Yes, the two are attached. The key lines from their diff result:

    $ diff detailed.log-v8.1.0 detailed.log-v8.5.1
    ...
    < #  Compiler flags used for this build:
    < #        CMAKE_CXX_FLAGS:              -pedantic -fpic -Wall
    -Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare -Wswitch
    -Wno-unused-local-typedefs -Wno-long-long -Wno-deprecated
    -Wno-deprecated-declarations -std=c++11 -Wno-parentheses
    -Wno-long-long
    < #        DEAL_II_CXX_FLAGS_RELEASE:    -O2 -funroll-loops
    -funroll-all-loops -fstrict-aliasing -Wno-unused
    ---
    > #  Base configuration (prior to feature configuration):
    > #        DEAL_II_CXX_FLAGS:            -pedantic -fPIC -Wall
    -Wextra -Wpointer-arith -Wwrite-strings -Wsynth -Wsign-compare
    -Wswitch -Woverloaded-virtual -Wno-long-long
    -Wno-deprecated-declarations -Wno-literal-suffix -std=c++11
    > #        DEAL_II_CXX_FLAGS_RELEASE:    -O2 -funroll-loops
    -funroll-all-loops -fstrict-aliasing -Wno-unused-local-typedefs
    18c19
    < #        DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic -pthread
    ---
    > #        DEAL_II_LINKER_FLAGS: -Wl,--as-needed -rdynamic
    -fuse-ld=gold
    ...
    > #            BOOST_CXX_FLAGS = -Wno-unused-local-typedefs
    ...
    > #      ( DEAL_II_WITH_BZIP2 = OFF )
    > #        DEAL_II_WITH_CXX11 = ON
    > #      ( DEAL_II_WITH_CXX14 = OFF )
    > #      ( DEAL_II_WITH_GSL = OFF )
    ...
    > #            THREADS_CXX_FLAGS = -Wno-parentheses
    > #            THREADS_LINKER_FLAGS = -pthread


    On Tuesday, December 26, 2017 at 10:10:44 PM UTC+3, Matthias
    Maier wrote:

        Would you mind sending us the "detailed.log" files?

        Best,
        Matthias


        On Tue, Dec 26, 2017, at 12:35 CST, [email protected] wrote:

        > Thanks. This is strange as I still get 15-20% consistently
        better results
        > in favor of older versions
        >  on three different machines already. Two more studies on
        other systems
        > attached below.
        >
        > TEST: Step-23 (integration time modified from 5 to 150,
        output suppressed)
        > CMAKE_BUILD_TYPE: "Release".
        >
        > MACHINE 1: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
        > $ cat /etc/redhat-release
        > CentOS Linux release 7.4.1708 (Core)
        > $ gcc --version
        > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
        >
        > deal v8.1.0 (built and installed from source):
        >
        > $ time ./step-23
        > real    1m23.768s
        > user    5m46.080s
        > sys    0m4.079s
        >
        > deal v8.5.1 (built and installed from source):
        >
        > $ time ./step-23
        > real    1m42.416s
        > user    5m37.018s
        > sys    0m4.340s
        >
        > MACHINE 2: Intel(R) Xeon(R) CPU X5690  @ 3.47GHz
        > $ lsb_release -a
        > Description:    Ubuntu 14.04.5 LTS
        > $ gcc --version
        > gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
        >
        > deal v8.1.0 (built and installed from source):
        >
        > $ time ./step-23
        > real    2m49.114s
        > user    11m41.429s
        > sys    0m48.882s
        >
        > deal v8.5.1 (built and installed from source):
        > $ time ./step-23
        > real    3m20.583s
        > user    10m54.850s
        > sys    2m18.989s
        >
        >
        > On Tuesday, December 26, 2017 at 6:08:17 PM UTC+3, Matthias
        Maier wrote:
        >>
        >> Hi,
        >>
        >> I get relatively comparable results for both versions:
        >>
        >> dev: ./step-23  55.55s user 1.64s system 131% cpu 43.637
        total
        >> 8.0: ./step-23  55.85s user 1.48s system 129% cpu 44.130
        total
        >>
        >> Is this the unmodified step-23 tutorial program?
        >>
        >> For measuring performance regressions a total runtime of
        less than 5
        >> seconds doesn't say that much. Never versions allocate and
        precompute
        >> quite a bunch of stuff upfront which might result in a
        small (problem
        >> independent) fixed runtime overhead (of a second or less).
        >>
        >> Best,
        >> Matthias
        >>
        >>
        >>
        >> On Tue, Dec 26, 2017, at 05:52 CST, [email protected]
        <javascript:>
        >> wrote:
        >>
        >> > Deal.II 8.5.1 seems to be 20% slower than 8.0.0. This is
        the timing I
        >> get
        >> > when running the Step-23 tutorial (output to screen and
        vtk is
        >> suppressed):
        >> >
        >> > deal.II version 8.0.0:
        >> >
        >> > $ time ./step-23
        >> > Number of active cells: 16384
        >> > Number of degrees of freedom: 16641
        >> >
        >> > real    0m3.432s
        >> > user    0m6.320s
        >> > sys    0m0.612s
        >> >
        >> > deal.II version 8.5.1:
        >> >
        >> > $ time ./step-23
        >> > Number of active cells: 16384
        >> > Number of degrees of freedom: 16641
        >> >
        >> > real    0m4.430s
        >> > user    0m7.080s
        >> > sys    0m0.492s
        >> >
        >> > In general, I get about 20% slow down for my own code
        when upgrading
        >> from
        >> > 8.0.0 to 8.5.1. What is the reason of such a slow down?
        Does the deal.II
        >> > follow the right direction given new versions become
        gradually slower?!
        >>

-- The deal.II project is located at http://www.dealii.org/
    For mailing list/forum options, see
    https://groups.google.com/d/forum/dealii?hl=en
    <https://groups.google.com/d/forum/dealii?hl=en>
    ---
    You received this message because you are subscribed to the
    Google Groups "deal.II User Group" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected] <javascript:>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to