Re: [FEniCS] Improvement to assembly performance with a minor regression

Martin Sandve Alnæs Mon, 12 May 2014 05:27:21 -0700

Note that this benchmark suggests we have some serious overhead in the
function restriction code...


Martin


On 12 May 2014 14:14, Anders Logg <[email protected]> wrote:

> Great, thanks for checking.
>
> --
> Anders
>
>
> On Mon, May 12, 2014 at 11:06:20AM +0200, Martin Sandve Alnæs wrote:
> > The functional here is the same as in the previous email, just on a newer
> > faster laptop. On this computer I don't see any slowdown for the
> functional
> > either.
> >
> > Functional (a=f*dx, b=f*dx+g*dx(1)):
> > Before:
> > A: 0.261495113373
> > B: 0.431048870087
> > After:
> > A: 0.251796007156
> > B: 0.249910116196
> >
> > Linear form (a=f*v*dx, b=f*v*dx+g*v*dx(1)):
> > Before:
> > A: 0.302114009857
> > B: 0.478947877884
> > After:
> > A: 0.292839050293
> > B: 0.29376912117
> >
> > Bilinear form (a=f*v*u*dx, b=f*v*u*dx+g*v*u*dx(1)):
> > Before:
> > A: 0.665670156479
> > B: 0.849056959152
> > After:
> > A: 0.648552894592
> > B: 0.685650110245
> >
> > I'll go ahead with merging then.
> >
> > Martin
> >
> >
> >
> > On 12 May 2014 10:27, Martin Sandve Alnæs <[email protected]> wrote:
> >
> >     I'll check. It's just really painful to rebuild with ufc changes...
> Is it
> >     really necessary to rebuild all of dolfin after ufc changes? The
> dolfin
> >     build system is not really doing its job in this situation.
> >
> >     Martin
> >
> >
> >     On 9 May 2014 21:55, Anders Logg <[email protected]> wrote:
> >
> >         On Fri, May 09, 2014 at 03:27:20PM +0200, Martin Sandve Alnæs
> wrote:
> >         > Hi all,
> >         > I've implemented selective local evaluation of coefficient
> functions
> >         in the
> >         > assembler depending on which functions each integral depends
> on. It's
> >         currently
> >         > in branches called
> >         > martinal/topic-add-enabled-coefficients-per-integral
> >         > in ufl, ffc and dolfin (must be used together).
> >         > Note that this changes ufc interface so everything must be
> >         recompiled.
> >         >
> >         > To show the performance improvement, here's a simple benchmark
> >         script,
> >         > assembling two forms (called a and b) that depend on one and
> two
> >         coefficients
> >         > (f and (f and g) respectively) but yield the exact same
> integral and
> >         assembly
> >         > result when assembled without any subdomains (the dx(1) term
> in form
> >         b is never
> >         > executed). Each form is assembled twice for semi-robust timing
> and I
> >         first ran
> >         > the script to keep the jit out of the picture. (Performance
> numbers
> >         below the
> >         > code).
> >         >
> >         >
> >         > from dolfin import *
> >         > import time
> >         >
> >         > n = 60
> >         > mesh = UnitCubeMesh(n, n, n)
> >         > V = FunctionSpace(mesh, "Lagrange", 1)
> >         > f = Function(V)
> >         > g = Function(V)
> >         >
> >         > a = f*dx()
> >         > b = f*dx() + g*dx(1)
> >         >
> >         > t1 = time.time()
> >         > A1 = assemble(a)
> >         > t2 = time.time()
> >         > A2 = assemble(a)
> >         > t3 = time.time()
> >         >
> >         > print "A1:", (t2-t1)
> >         > print "A2:", (t3-t2)
> >         >
> >         > t1 = time.time()
> >         > B1 = assemble(b)
> >         > t2 = time.time()
> >         > B2 = assemble(b)
> >         > t3 = time.time()
> >         >
> >         > print "B1:", (t2-t1)
> >         > print "B2:", (t3-t2)
> >         >
> >         >
> >         > Resulting time to assemble with current master:
> >         >
> >         > A1: 0.467525005341
> >         > A2: 0.465034008026
> >         > B1: 0.882906198502
> >         > B2: 0.830652952194
> >         >
> >         > Note how the additional coefficient in form b gives very
> significant
> >         overhead
> >         > for this simple functional even though it's never used in the
> >         computations.
> >         >
> >         > The time to assemble with the new branches:
> >         >
> >         > A1: 0.531542062759
> >         > A2: 0.530611991882
> >         > B1: 0.540424108505
> >         > B2: 0.535769939423
> >         >
> >         > Note two things:
> >         > The performance is a bit lower for the simple case. It might be
> >         possible to
> >         > optimize this.
> >         > The performance is the same for both cases, significantly
> faster for
> >         form b
> >         > because the function g is never restricted.
> >         >
> >         >
> >         > The cases that will benefit from this feature performance wise
> are
> >         forms with
> >         > two or more integrals involving different coefficients.
> >         >
> >         > The cases that will have a small regression performance wise
> are
> >         forms with
> >         > only one integral, with no coefficients, or where all
> integrals use
> >         the same
> >         > coefficients. The relative performance regression is most
> noticable
> >         for simple
> >         > forms such as mass and stiffness matrices.
> >         >
> >         > There are multiple future features that depend on this
> functionality:
> >         > - it allows for functions that cannot be evaluated everywhere
> to be
> >         called only
> >         > in their valid domain (examples are functions only living on
> >         subdomains, a
> >         > partially overlapping mesh, or the boundary).
> >         > - possible refactoring of preprocessing in ufl to reduce the
> amount
> >         of symbolic
> >         > processing done for forms that are already in the jit cache.
> >         >
> >         > The functionality is obviously highly beneficial, so is it ok
> if I
> >         push it now
> >         > even with the performance regression for simple forms?
> >
> >         Could you first check what the performance regression is (if
> any) for
> >         assembling a standard right-hand side vector f*dx and Poisson
> >         stiffness matrix?
> >
> >         Perhaps this is only noticeable for functionals.
> >
> >
> >
> >
> >
>

_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Improvement to assembly performance with a minor regression

Reply via email to