Re: [FEniCS-support] Newton solver, Jacobian computation

Garth N. Wells Wed, 18 Dec 2013 06:55:04 -0800

On 2013-12-18 14:46, Nico Schlömer wrote:

I'll be happy to try those out and report figures.


Here for some timings with the new UFC changes:

ufc.get_cell_integral:     00:00:00.007066
ufc.update:                00:00:01.092362
local-to-global dof maps:  00:00:00.008636
integral->tabulate_tensor: 00:00:03.379240
add_to_global_tensor       00:00:01.634146

-- Well, the benefit for my application isn't that great.
Speeding up the assemble process for one call isn't necessarily the
intention here anyways; I'm more concerned about a series of similar
linear problems.
Could it be, for example, that ufc.update(...) does the same thing
over and over if you call assemble_cells(...) for the same mesh and
function spaces (i.e., set of test and trial functions)?

As for the add_to_global_tensor(...) call, I'm not sure why it takes
so long at all. All that seems to going on is a bit of vector i/o. Hm.

Are you assembling a matrix or a vector? Matrix insertion has the costof searches into a sparse matrix. If you're using PETSc, make sure thatPETSc is built without debugging (--with-debugging=no).

My experience is that for the Poisson problem the cost of the CSR matrixinsertion is on-par with tabulation of the element tensor.


Garth

--Nico
On Mon, Dec 16, 2013 at 4:09 PM, Garth N. Wells <[email protected]>wrote:
On 2013-12-16 14:58, Nico Schlömer wrote:
How did you do the timing?
#include "boost/date_time/posix_time/posix_time.hpp"
using namespace boost::posix_time;

for () {
    time_start = microsec_clock::local_time();
    // ...code to time...
    time_end = microsec_clock::local_time();
    duration += time_end - time_start;
}
std::cout << duration << std::endl;
Some of these calls involve so little work that they are hard totime.
Indeed, but in the case here not too small to be missed by the
microsecond timer. (The number of cells is 44976 here, so one
ufc.update call for example is around 25 ms.)
Some are too small, e.g. getting a reference to a dofmap. The creationand
destruction of the timer will dominate.
There are some optimisations for UFC::update(...) in

    https://bitbucket.org/fenics-project/dolfin/pull-request/73/
I'll be happy to try those out and report figures. (Right now I'm
still getting a compilation error,

<https://bitbucket.org/fenics-project/dolfin/pull-request/73/changes-for-ufc-cell-clean-up/diff>).
You need to get the UFC changes too.
Preserving the sparsity structure makes a significant difference,but Ibelieve that you'll see the difference when GenericMatrix::apply()is
called.
That may be; right now, the computation time for the Krylov iteration
is much smaller than for the Jacobian construction which is why I
haven't taken a closer look yet.
There is limit scope for speed up of this step (if you want to solve
Ax=b).
If you can get away without a preconditioner, you can: (1) gomatrix-free;or (2) implement an unassembled matrix data structure that can domat-vec
products. I'm not aware of PETSc supporting unassembled matrices.

Garth
Like I said, it's more that I'm solving a series of $A_i x = b_i$
(nonlinear time integration -- Navier-Stokes in fact).

Cheers,
Nico
On Mon, Dec 16, 2013 at 3:08 PM, Garth N. Wells <[email protected]>wrote:
On 2013-12-16 13:23, Nico Schlömer wrote:
I did some timings here and found that whenever the Jacobian is
constructed, most of the computational time goes into the cell
iteration in Assembler::assemble_cells(...). Specifically (in unitsof
seconds)
Not surprising. It's where all the work is done.
ufc.get_cell_integral:             00:00:00.006977
ufc.update                             00:00:01.133166
get local-to-global dof maps: 00:00:00.009772
integral->tabulate_tensor:     00:00:03.289413
add_to_global_tensor:          00:00:01.693635
How did you do the timing? Some of these calls involve so littlework
that
they are hard to time.
I'm not entirely sure what the purpose of all of these call is, butI
guess that tabulate_tensor does the actual heavy lifting, i.e., the
integration. Besides this, the ufc.update
There are some optimisations for UFC::update(...) in

    https://bitbucket.org/fenics-project/dolfin/pull-request/73/
UFC::update(...) copies some data unnecessarily (e.g. topologydata).
and the addition to the
global tensor take a significant amount of time.
Since I'm solving a series of linear problems (in fact, a (time)
series of nonlinear problems) very similar in structure, I thinkthe
one or the other call might be cached away. The mere caching of the
sparsity structure as done in the CahnHilliard demo doesn't domuch.
Preserving the sparsity structure makes a significant difference,but Ibelieve that you'll see the difference when GenericMatrix::apply()is
called.
Does anyone have more insight into what might be worth exploitinghere
for speedup?
There is limit scope for speed up of this step (if you want to solve
Ax=b).
I have a plan for some mesh data reordering that can speed upinsertionthrough improved data locality. One reason the insertion looksrelativelycostly is that the tabulate_tensor function is highly optimised.Fromnumbers I've seen about, it can be several times faster than forother
codes.

It is possible to assemble into a different sparse matrix data
structures,
but my experience is that this pushes the time cost further down the
line,
i.e. to the linear solver. You can try assembling into a
dolfin::STLMatrix.

Garth
Cheers,
Nico
On Mon, Jun 3, 2013 at 8:54 PM, Garth N. Wells <[email protected]>wrote:
On 3 June 2013 19:49, Nico Schlömer <[email protected]>wrote:
Hi all,

when solving nonlinear problems, I simply went with

# define F
J = derivative(F, u)
solve(F1 == 0, u, bcs, J)

for now (which uses Newton's method).
I noticed, however, that the computation of the Jacobian,

nonlinear_problem.J(*_A, x);

takes by the most time in the computation.
Is there some caching I could employ? I need to solve a similar
nonlinear
system in each time step.
Use the lower-level NewtonSolver class. You then have completecontrol
over how J is computed/supplied/cached.

The Cahn-Hilliard demo illustrates use of the NewtonSolver class.

Garth
--Nico

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

_______________________________________________
fenics-support mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics-support

Re: [FEniCS-support] Newton solver, Jacobian computation

Reply via email to