2009/3/23 Robert Kirby <[email protected]>: > Hi all, some thoughts: > 1.) In the current paradigm (build + apply), building the matrix is > typically not dominant in the overall run-time.
Isn't this in contrast with what Kent said about matrix insertion being the bottleneck? Are there any references for these claims? Optimizing the application > in a Krylov solver is good. Optimizing matrix-free evaluation is also good. > Optimizing construction is nice if you have to do it frequently, so it > doesn't hurt. Just don't explode compile time. > 2.) template metaprogramming can be very powerful (it's Turing complete, if > only accidentally), but it can also obscure the code and make it difficult > to maintain and modify. While ffc may be complicated inside and the code it > generates ugly, the inputs that a user writes are quite nice. A look inside > Thyra within Trilinos will indicate that too much metaprogramming at the > user interface level can become cumbersome. Like I mentioned in reply to Garth, the form DSL and FFC code generation are independent concepts: DSL parsing does not have to be necessarily followed by code-generation. The DSL can continue its job of hiding complexities while performing FEM using template meta-programming. -Ali > 3.) LifeV does some template metaprogramming in the FEM context with some > success. > 4.) Before trying to optimize this or that or import new technologies, it > makes sense to do some serious profiling on existing codes. Not just > Poisson. Something nontrivial, both in terms of size and complexity, like a > turbulence model or inverse problem with hard-to-solve systems, lots of > assembly, data I/O. If you spend 90% of your time in PETSc kernels, don't > bother with further optimizations of ffc/dolfin. I don't know what the > numbers are. But it might make a very interesting talk for someone to give > at FEniCS'09. > Rob > On Mon, Mar 23, 2009 at 7:12 AM, Garth N. Wells <[email protected]> wrote: >> >> >> A Navaei wrote: >> > 2009/3/23 Kent Andre <[email protected]>: >> >> The code that FFC produces is about as fast as light. It has been >> >> documented in a number of papers. >> > >> > Is there any data available comparing the FFC performance to the >> > hardware peak? >> > >> >> FFC does not operate in isolation, so it is not possible to make a >> comparison to max flops of a CPU. Furthermore, in a typical simulation >> with code generated by FFC, other parts of the solution process dominate >> (such as insertion as mentioned by Kent) and the linear solve, so >> whether or not FFC generated code is optimal in terms of peak flops of a >> machine is not relevant to runtime performance. >> >> >> I don't think you should try to beat FFC with generic meta-programming. >> >> Or you could do it but, but don't have to high expectations... >> >> >> >> Insertion into the matrix is currently the bottleneck. But FFC does >> >> not have anything to do with this. >> > >> > While FFC doesn't have anything to do with this, dolfin does. In the >> > case of the MTL4 backend wrapper, it is implemented badly by ignoring >> > the meta-programming potentials. >> >> This is not a constructive comment. Patches are welcome. >> >> For instance, sparse matrix insertion >> > is done by forming a sparsity pattern outside of MTL4 and then >> > assigning the pointers to MTL4 API, while loop unrolling could have >> > been used here. >> > >> >> If you look at the code, the FFC backend does not use the sparsity >> pattern. The MTL4 inserter does have some options which we have not yet >> been taken advantage of, so again patches are welcome. >> >> Garth >> >> > >> > -Ali >> > >> >> Kent >> >> >> >> >> >> On ma., 2009-03-23 at 10:11 +0000, A Navaei wrote: >> >>> The success of MTL4 based on generic meta-programming, arises the >> >>> question about re-visiting the efficiency of code-generation >> >>> approaches, including FFC. Given that FEM can particularly benefit >> >>> from major meta-programming characteristics, namely static >> >>> polymorphism and loop unrolling, MTL4 demonstrates that the >> >>> code-generation part can be much more efficiently replaced by inlining >> >>> performed at compile-time. >> >>> >> >>> Without having a concrete meta-programming implementation, it may be >> >>> impossible to predict how much performance one would gain compared to >> >>> FFC. However, MTL4 has been reported to be many times faster than >> >>> code-generation means such as ATLAS. >> >>> >> >>> Based on this, are there any specific benefits in FFC code-generation >> >>> which may not be covered by meta-programming? >> >>> >> >>> >> >>> -Ali >> >>> _______________________________________________ >> >>> DOLFIN-dev mailing list >> >>> [email protected] >> >>> http://www.fenics.org/mailman/listinfo/dolfin-dev >> >> >> > _______________________________________________ >> > DOLFIN-dev mailing list >> > [email protected] >> > http://www.fenics.org/mailman/listinfo/dolfin-dev >> >> _______________________________________________ >> DOLFIN-dev mailing list >> [email protected] >> http://www.fenics.org/mailman/listinfo/dolfin-dev > > _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
