On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote: > On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote: >> ---------- Forwarded message ---------- >> From: Matthew Knepley <[EMAIL PROTECTED]> >> Date: Wed, Aug 6, 2008 at 4:24 AM >> Subject: Re: [DOLFIN-dev] Assembly benchmark >> To: "Garth N. Wells" <[EMAIL PROTECTED]> >> >> >> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote: >> >> ok, here's the page, let's see some numbers: >> >> >> >> http://www.fenics.org/wiki/Benchmark >> >> >> > >> > I just added my results. >> > >> > The most obvious difference in our systems is 32/64 bit which could >> > likely account for the differences. MTL4 seems considerably faster on >> > the 32 bit system. >> >> I need to understand the categories into which the time is divided: >> >> 1) They do not add to the total (or even close) > > There are 8 tables: > > 0 Assemble total > 1 Init dof map > 2 Build sparsity > 3 Init tensor > 4 Delete sparsity > 5 Assemble cells > 6 Overhead > > 7 Reassemble total > > The first is the total and includes 1-6 so tables 1-6 should > add up to table 0. In fact, table 6 ("Overhead") is computed as the > difference of table 0 and tables 1-5. > > Then table 7 reports the total for reassembling into a matrix which > has already been initialized with the correct sparsity pattern (and > used before). > > Maybe there's a better way to order/present the tables to make this > clear? > >> 2) I am not sure what is going on within each unit > > 1 Init dof map > > This one does some initialization for computing the dof map. The only > thing that may happen here (for FFC forms) is that we may generate > the edges and faces if those are needed. You can see the difference > for P1, P2 and P3.
Don't understand why this is different for any of the backends. > 2 Build sparsity > > This one computes the sparsity pattern by iterating over all cells, > computing the local-to-global mapping on each cell and counting the > number of nonzeros. Same question. > 3 Init tensor > > This one initializes the matrix from the sparsity pattern by looking > at the number of nonzeros per row (calling MatSeqAIJSetPreallocation) > in PETSc. Okay. > 4 Delete sparsity > > This one deletes the sparsity pattern. This shouldn't take any time > but we found in some tests it actually does (due to some STL > peculiarities). This is nonzero for some PETSc runs, which makes no sense. > 5 Assemble cells > > This one does the actual assembly loop over cells and inserts > (MatSetValues in PETSc). It would be nice to time calculation vs. insertion time. > 6. Overhead > > Everything else not specifically accounted for. > >> 3) This is still much more expensive than my PETSc example (which can be >> easily run. Its ex2 in KSP). > > Do we use the same mesh? In 2D it's a 256x256 unit square and in 3D > it's a 32x32x32 unit cube. Okay, I will switch to this. Matt >> Thus it is hard for me to be convinced that something underneath is just not >> preventing fast operation. Furthermore, this is not checked against a >> performance >> model, say plotted against the number of cells. > > I agree that would be good to have as well, but there's also a point > in keeping the benchmark small (so it's fast to run and compare). > > -- > Anders > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > > iD8DBQFImXZNTuwUCDsYZdERAqIEAKCWSFcyKc5nHLrpAxCycvsEuhhBIwCcDdqj > FxipJLz0IMKG8WAOt6kCVoo= > =a7S8 > -----END PGP SIGNATURE----- > > _______________________________________________ > DOLFIN-dev mailing list > [email protected] > http://www.fenics.org/mailman/listinfo/dolfin-dev > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
