Re: [FEniCS] Improvement to assembly performance with a minor regression

Anders Logg Mon, 12 May 2014 05:16:38 -0700

Great, thanks for checking.

--
Anders



On Mon, May 12, 2014 at 11:06:20AM +0200, Martin Sandve Alnæs wrote:
> The functional here is the same as in the previous email, just on a newer
> faster laptop. On this computer I don't see any slowdown for the functional
> either.
>
> Functional (a=f*dx, b=f*dx+g*dx(1)):
> Before:
> A: 0.261495113373
> B: 0.431048870087
> After:
> A: 0.251796007156
> B: 0.249910116196
>
> Linear form (a=f*v*dx, b=f*v*dx+g*v*dx(1)):
> Before:
> A: 0.302114009857
> B: 0.478947877884
> After:
> A: 0.292839050293
> B: 0.29376912117
>
> Bilinear form (a=f*v*u*dx, b=f*v*u*dx+g*v*u*dx(1)):
> Before:
> A: 0.665670156479
> B: 0.849056959152
> After:
> A: 0.648552894592
> B: 0.685650110245
>
> I'll go ahead with merging then.
>
> Martin
>
>
>
> On 12 May 2014 10:27, Martin Sandve Alnæs <[email protected]> wrote:
>
>     I'll check. It's just really painful to rebuild with ufc changes... Is it
>     really necessary to rebuild all of dolfin after ufc changes? The dolfin
>     build system is not really doing its job in this situation.
>
>     Martin
>
>
>     On 9 May 2014 21:55, Anders Logg <[email protected]> wrote:
>
>         On Fri, May 09, 2014 at 03:27:20PM +0200, Martin Sandve Alnæs wrote:
>         > Hi all,
>         > I've implemented selective local evaluation of coefficient functions
>         in the
>         > assembler depending on which functions each integral depends on. 
> It's
>         currently
>         > in branches called
>         > martinal/topic-add-enabled-coefficients-per-integral
>         > in ufl, ffc and dolfin (must be used together).
>         > Note that this changes ufc interface so everything must be
>         recompiled.
>         >
>         > To show the performance improvement, here's a simple benchmark
>         script,
>         > assembling two forms (called a and b) that depend on one and two
>         coefficients
>         > (f and (f and g) respectively) but yield the exact same integral and
>         assembly
>         > result when assembled without any subdomains (the dx(1) term in form
>         b is never
>         > executed). Each form is assembled twice for semi-robust timing and I
>         first ran
>         > the script to keep the jit out of the picture. (Performance numbers
>         below the
>         > code).
>         >
>         >
>         > from dolfin import *
>         > import time
>         >
>         > n = 60
>         > mesh = UnitCubeMesh(n, n, n)
>         > V = FunctionSpace(mesh, "Lagrange", 1)
>         > f = Function(V)
>         > g = Function(V)
>         >
>         > a = f*dx()
>         > b = f*dx() + g*dx(1)
>         >
>         > t1 = time.time()
>         > A1 = assemble(a)
>         > t2 = time.time()
>         > A2 = assemble(a)
>         > t3 = time.time()
>         >
>         > print "A1:", (t2-t1)
>         > print "A2:", (t3-t2)
>         >
>         > t1 = time.time()
>         > B1 = assemble(b)
>         > t2 = time.time()
>         > B2 = assemble(b)
>         > t3 = time.time()
>         >
>         > print "B1:", (t2-t1)
>         > print "B2:", (t3-t2)
>         >
>         >
>         > Resulting time to assemble with current master:
>         >
>         > A1: 0.467525005341
>         > A2: 0.465034008026
>         > B1: 0.882906198502
>         > B2: 0.830652952194
>         >
>         > Note how the additional coefficient in form b gives very significant
>         overhead
>         > for this simple functional even though it's never used in the
>         computations.
>         >
>         > The time to assemble with the new branches:
>         >
>         > A1: 0.531542062759
>         > A2: 0.530611991882
>         > B1: 0.540424108505
>         > B2: 0.535769939423
>         >
>         > Note two things:
>         > The performance is a bit lower for the simple case. It might be
>         possible to
>         > optimize this.
>         > The performance is the same for both cases, significantly faster for
>         form b
>         > because the function g is never restricted.
>         >
>         >
>         > The cases that will benefit from this feature performance wise are
>         forms with
>         > two or more integrals involving different coefficients.
>         >
>         > The cases that will have a small regression performance wise are
>         forms with
>         > only one integral, with no coefficients, or where all integrals use
>         the same
>         > coefficients. The relative performance regression is most noticable
>         for simple
>         > forms such as mass and stiffness matrices.
>         >
>         > There are multiple future features that depend on this 
> functionality:
>         > - it allows for functions that cannot be evaluated everywhere to be
>         called only
>         > in their valid domain (examples are functions only living on
>         subdomains, a
>         > partially overlapping mesh, or the boundary).
>         > - possible refactoring of preprocessing in ufl to reduce the amount
>         of symbolic
>         > processing done for forms that are already in the jit cache.
>         >
>         > The functionality is obviously highly beneficial, so is it ok if I
>         push it now
>         > even with the performance regression for simple forms?
>
>         Could you first check what the performance regression is (if any) for
>         assembling a standard right-hand side vector f*dx and Poisson
>         stiffness matrix?
>
>         Perhaps this is only noticeable for functionals.
>
>
>
>
>
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Re: [FEniCS] Improvement to assembly performance with a minor regression

Reply via email to