On Wed, Aug 06, 2008 at 01:44:24PM +0100, Garth N. Wells wrote: > > > Anders Logg wrote: > > On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote: > >> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote: > >>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote: > >>>> ---------- Forwarded message ---------- > >>>> From: Matthew Knepley <[EMAIL PROTECTED]> > >>>> Date: Wed, Aug 6, 2008 at 4:24 AM > >>>> Subject: Re: [DOLFIN-dev] Assembly benchmark > >>>> To: "Garth N. Wells" <[EMAIL PROTECTED]> > >>>> > >>>> > >>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote: > >>>>>> ok, here's the page, let's see some numbers: > >>>>>> > >>>>>> http://www.fenics.org/wiki/Benchmark > >>>>>> > >>>>> I just added my results. > >>>>> > >>>>> The most obvious difference in our systems is 32/64 bit which could > >>>>> likely account for the differences. MTL4 seems considerably faster on > >>>>> the 32 bit system. > >>>> I need to understand the categories into which the time is divided: > >>>> > >>>> 1) They do not add to the total (or even close) > >>> There are 8 tables: > >>> > >>> 0 Assemble total > >>> 1 Init dof map > >>> 2 Build sparsity > >>> 3 Init tensor > >>> 4 Delete sparsity > >>> 5 Assemble cells > >>> 6 Overhead > >>> > >>> 7 Reassemble total > >>> > >>> The first is the total and includes 1-6 so tables 1-6 should > >>> add up to table 0. In fact, table 6 ("Overhead") is computed as the > >>> difference of table 0 and tables 1-5. > >>> > >>> Then table 7 reports the total for reassembling into a matrix which > >>> has already been initialized with the correct sparsity pattern (and > >>> used before). > >>> > >>> Maybe there's a better way to order/present the tables to make this > >>> clear? > >>> > >>>> 2) I am not sure what is going on within each unit > >>> 1 Init dof map > >>> > >>> This one does some initialization for computing the dof map. The only > >>> thing that may happen here (for FFC forms) is that we may generate > >>> the edges and faces if those are needed. You can see the difference > >>> for P1, P2 and P3. > >> Don't understand why this is different for any of the backends. > > > > It's the same, or should be. The benchmark just runs each test case > > once so there may be small "random" fluctuations in the numbers. > > > > The numbers of Table 1 are essentially the same for all backends. > > > >>> 2 Build sparsity > >>> > >>> This one computes the sparsity pattern by iterating over all cells, > >>> computing the local-to-global mapping on each cell and counting the > >>> number of nonzeros. > >> Same question. > > > > This should be the same for all backends except for Epetra. The DOLFIN > > LA interface allows for overloading the handling of the sparsity > > pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity > > pattern. It seems to perform worse than the DOLFIN built-in sparsity > > pattern (used for all other backends) which is just a simple > > > > MTL4 isn't using a sparsity pattern. A guess is just being made as to > the number of non-zeroes per row. > > > std::vector< std::set<uint> > > > > > It's now a > > std::vector< std::vector<uint> > > > which is faster than using std::set. Only uBLAS needs the terms to be > ordered (std::set is ordered), so I added SparsityPattern::sort() to do > this. > > Garth
I didn't know. I'm surprised that doing a linear search is faster than using std::set. I thought the std::set was optimized for this. -- Anders _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
