Note that this benchmark suggests we have some serious overhead in the function restriction code...
Martin On 12 May 2014 14:14, Anders Logg <[email protected]> wrote: > Great, thanks for checking. > > -- > Anders > > > On Mon, May 12, 2014 at 11:06:20AM +0200, Martin Sandve Alnæs wrote: > > The functional here is the same as in the previous email, just on a newer > > faster laptop. On this computer I don't see any slowdown for the > functional > > either. > > > > Functional (a=f*dx, b=f*dx+g*dx(1)): > > Before: > > A: 0.261495113373 > > B: 0.431048870087 > > After: > > A: 0.251796007156 > > B: 0.249910116196 > > > > Linear form (a=f*v*dx, b=f*v*dx+g*v*dx(1)): > > Before: > > A: 0.302114009857 > > B: 0.478947877884 > > After: > > A: 0.292839050293 > > B: 0.29376912117 > > > > Bilinear form (a=f*v*u*dx, b=f*v*u*dx+g*v*u*dx(1)): > > Before: > > A: 0.665670156479 > > B: 0.849056959152 > > After: > > A: 0.648552894592 > > B: 0.685650110245 > > > > I'll go ahead with merging then. > > > > Martin > > > > > > > > On 12 May 2014 10:27, Martin Sandve Alnæs <[email protected]> wrote: > > > > I'll check. It's just really painful to rebuild with ufc changes... > Is it > > really necessary to rebuild all of dolfin after ufc changes? The > dolfin > > build system is not really doing its job in this situation. > > > > Martin > > > > > > On 9 May 2014 21:55, Anders Logg <[email protected]> wrote: > > > > On Fri, May 09, 2014 at 03:27:20PM +0200, Martin Sandve Alnæs > wrote: > > > Hi all, > > > I've implemented selective local evaluation of coefficient > functions > > in the > > > assembler depending on which functions each integral depends > on. It's > > currently > > > in branches called > > > martinal/topic-add-enabled-coefficients-per-integral > > > in ufl, ffc and dolfin (must be used together). > > > Note that this changes ufc interface so everything must be > > recompiled. > > > > > > To show the performance improvement, here's a simple benchmark > > script, > > > assembling two forms (called a and b) that depend on one and > two > > coefficients > > > (f and (f and g) respectively) but yield the exact same > integral and > > assembly > > > result when assembled without any subdomains (the dx(1) term > in form > > b is never > > > executed). Each form is assembled twice for semi-robust timing > and I > > first ran > > > the script to keep the jit out of the picture. (Performance > numbers > > below the > > > code). > > > > > > > > > from dolfin import * > > > import time > > > > > > n = 60 > > > mesh = UnitCubeMesh(n, n, n) > > > V = FunctionSpace(mesh, "Lagrange", 1) > > > f = Function(V) > > > g = Function(V) > > > > > > a = f*dx() > > > b = f*dx() + g*dx(1) > > > > > > t1 = time.time() > > > A1 = assemble(a) > > > t2 = time.time() > > > A2 = assemble(a) > > > t3 = time.time() > > > > > > print "A1:", (t2-t1) > > > print "A2:", (t3-t2) > > > > > > t1 = time.time() > > > B1 = assemble(b) > > > t2 = time.time() > > > B2 = assemble(b) > > > t3 = time.time() > > > > > > print "B1:", (t2-t1) > > > print "B2:", (t3-t2) > > > > > > > > > Resulting time to assemble with current master: > > > > > > A1: 0.467525005341 > > > A2: 0.465034008026 > > > B1: 0.882906198502 > > > B2: 0.830652952194 > > > > > > Note how the additional coefficient in form b gives very > significant > > overhead > > > for this simple functional even though it's never used in the > > computations. > > > > > > The time to assemble with the new branches: > > > > > > A1: 0.531542062759 > > > A2: 0.530611991882 > > > B1: 0.540424108505 > > > B2: 0.535769939423 > > > > > > Note two things: > > > The performance is a bit lower for the simple case. It might be > > possible to > > > optimize this. > > > The performance is the same for both cases, significantly > faster for > > form b > > > because the function g is never restricted. > > > > > > > > > The cases that will benefit from this feature performance wise > are > > forms with > > > two or more integrals involving different coefficients. > > > > > > The cases that will have a small regression performance wise > are > > forms with > > > only one integral, with no coefficients, or where all > integrals use > > the same > > > coefficients. The relative performance regression is most > noticable > > for simple > > > forms such as mass and stiffness matrices. > > > > > > There are multiple future features that depend on this > functionality: > > > - it allows for functions that cannot be evaluated everywhere > to be > > called only > > > in their valid domain (examples are functions only living on > > subdomains, a > > > partially overlapping mesh, or the boundary). > > > - possible refactoring of preprocessing in ufl to reduce the > amount > > of symbolic > > > processing done for forms that are already in the jit cache. > > > > > > The functionality is obviously highly beneficial, so is it ok > if I > > push it now > > > even with the performance regression for simple forms? > > > > Could you first check what the performance regression is (if > any) for > > assembling a standard right-hand side vector f*dx and Poisson > > stiffness matrix? > > > > Perhaps this is only noticeable for functionals. > > > > > > > > > > >
_______________________________________________ fenics mailing list [email protected] http://fenicsproject.org/mailman/listinfo/fenics
