Anders Logg wrote: > On Wed, Aug 06, 2008 at 01:44:24PM +0100, Garth N. Wells wrote: >> >> Anders Logg wrote: >>> On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote: >>>> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote: >>>>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote: >>>>>> ---------- Forwarded message ---------- >>>>>> From: Matthew Knepley <[EMAIL PROTECTED]> >>>>>> Date: Wed, Aug 6, 2008 at 4:24 AM >>>>>> Subject: Re: [DOLFIN-dev] Assembly benchmark >>>>>> To: "Garth N. Wells" <[EMAIL PROTECTED]> >>>>>> >>>>>> >>>>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote: >>>>>>>> ok, here's the page, let's see some numbers: >>>>>>>> >>>>>>>> http://www.fenics.org/wiki/Benchmark >>>>>>>> >>>>>>> I just added my results. >>>>>>> >>>>>>> The most obvious difference in our systems is 32/64 bit which could >>>>>>> likely account for the differences. MTL4 seems considerably faster on >>>>>>> the 32 bit system. >>>>>> I need to understand the categories into which the time is divided: >>>>>> >>>>>> 1) They do not add to the total (or even close) >>>>> There are 8 tables: >>>>> >>>>> 0 Assemble total >>>>> 1 Init dof map >>>>> 2 Build sparsity >>>>> 3 Init tensor >>>>> 4 Delete sparsity >>>>> 5 Assemble cells >>>>> 6 Overhead >>>>> >>>>> 7 Reassemble total >>>>> >>>>> The first is the total and includes 1-6 so tables 1-6 should >>>>> add up to table 0. In fact, table 6 ("Overhead") is computed as the >>>>> difference of table 0 and tables 1-5. >>>>> >>>>> Then table 7 reports the total for reassembling into a matrix which >>>>> has already been initialized with the correct sparsity pattern (and >>>>> used before). >>>>> >>>>> Maybe there's a better way to order/present the tables to make this >>>>> clear? >>>>> >>>>>> 2) I am not sure what is going on within each unit >>>>> 1 Init dof map >>>>> >>>>> This one does some initialization for computing the dof map. The only >>>>> thing that may happen here (for FFC forms) is that we may generate >>>>> the edges and faces if those are needed. You can see the difference >>>>> for P1, P2 and P3. >>>> Don't understand why this is different for any of the backends. >>> It's the same, or should be. The benchmark just runs each test case >>> once so there may be small "random" fluctuations in the numbers. >>> >>> The numbers of Table 1 are essentially the same for all backends. >>> >>>>> 2 Build sparsity >>>>> >>>>> This one computes the sparsity pattern by iterating over all cells, >>>>> computing the local-to-global mapping on each cell and counting the >>>>> number of nonzeros. >>>> Same question. >>> This should be the same for all backends except for Epetra. The DOLFIN >>> LA interface allows for overloading the handling of the sparsity >>> pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity >>> pattern. It seems to perform worse than the DOLFIN built-in sparsity >>> pattern (used for all other backends) which is just a simple >>> >> MTL4 isn't using a sparsity pattern. A guess is just being made as to >> the number of non-zeroes per row. >> >>> std::vector< std::set<uint> > >>> >> It's now a >> >> std::vector< std::vector<uint> > >> >> which is faster than using std::set. Only uBLAS needs the terms to be >> ordered (std::set is ordered), so I added SparsityPattern::sort() to do >> this. >> >> Garth > > I didn't know. I'm surprised that doing a linear search is faster than > using std::set. I thought the std::set was optimized for this. >
std::set was dead slow, which I attributed to it being ordered and therefore requiring shuffling after insertions. I tried std::tr1::unordered_set, but it wasn't much better. Garth _______________________________________________ DOLFIN-dev mailing list [email protected] http://www.fenics.org/mailman/listinfo/dolfin-dev
