Re: [DOLFIN-dev] Fwd: Assembly benchmark

Garth N. Wells Wed, 06 Aug 2008 08:08:47 -0700


Anders Logg wrote:
> On Wed, Aug 06, 2008 at 01:44:24PM +0100, Garth N. Wells wrote:
>>
>> Anders Logg wrote:
>>> On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:
>>>> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote:
>>>>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Matthew Knepley <[EMAIL PROTECTED]>
>>>>>> Date: Wed, Aug 6, 2008 at 4:24 AM
>>>>>> Subject: Re: [DOLFIN-dev] Assembly benchmark
>>>>>> To: "Garth N. Wells" <[EMAIL PROTECTED]>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote:
>>>>>>>> ok, here's the page, let's see some numbers:
>>>>>>>>
>>>>>>>>   http://www.fenics.org/wiki/Benchmark
>>>>>>>>
>>>>>>> I just added my results.
>>>>>>>
>>>>>>> The most obvious difference in our systems is 32/64 bit which could
>>>>>>> likely account for the differences. MTL4 seems considerably faster on
>>>>>>> the 32 bit system.
>>>>>> I need to understand the categories into which the time is divided:
>>>>>>
>>>>>>  1) They do not add to the total (or even close)
>>>>> There are 8 tables:
>>>>>
>>>>>  0 Assemble total
>>>>>  1 Init dof map
>>>>>  2 Build sparsity
>>>>>  3 Init tensor
>>>>>  4 Delete sparsity
>>>>>  5 Assemble cells
>>>>>  6 Overhead
>>>>>
>>>>>  7 Reassemble total
>>>>>
>>>>> The first is the total and includes 1-6 so tables 1-6 should
>>>>> add up to table 0. In fact, table 6 ("Overhead") is computed as the
>>>>> difference of table 0 and tables 1-5.
>>>>>
>>>>> Then table 7 reports the total for reassembling into a matrix which
>>>>> has already been initialized with the correct sparsity pattern (and
>>>>> used before).
>>>>>
>>>>> Maybe there's a better way to order/present the tables to make this
>>>>> clear?
>>>>>
>>>>>>  2) I am not sure what is going on within each unit
>>>>>  1 Init dof map
>>>>>
>>>>> This one does some initialization for computing the dof map. The only
>>>>> thing that may happen here (for FFC forms) is that we may generate
>>>>> the edges and faces if those are needed. You can see the difference
>>>>> for P1, P2 and P3.
>>>> Don't understand why this is different for any of the backends.
>>> It's the same, or should be. The benchmark just runs each test case
>>> once so there may be small "random" fluctuations in the numbers.
>>>
>>> The numbers of Table 1 are essentially the same for all backends.
>>>
>>>>>  2 Build sparsity
>>>>>
>>>>> This one computes the sparsity pattern by iterating over all cells,
>>>>> computing the local-to-global mapping on each cell and counting the
>>>>> number of nonzeros.
>>>> Same question.
>>> This should be the same for all backends except for Epetra. The DOLFIN
>>> LA interface allows for overloading the handling of the sparsity
>>> pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity
>>> pattern. It seems to perform worse than the DOLFIN built-in sparsity
>>> pattern (used for all other backends) which is just a simple
>>>
>> MTL4 isn't using a sparsity pattern. A guess is just being made as to 
>> the number of non-zeroes per row.
>>
>>>   std::vector< std::set<uint> >
>>>
>> It's now a
>>
>>      std::vector< std::vector<uint> >
>>
>> which is faster than using std::set. Only uBLAS needs the terms to be 
>> ordered (std::set is ordered), so I added SparsityPattern::sort() to do 
>> this.
>>
>> Garth
> 
> I didn't know. I'm surprised that doing a linear search is faster than
> using std::set. I thought the std::set was optimized for this.
>


std::set was dead slow, which I attributed to it being ordered and 
therefore requiring shuffling after insertions. I tried 
std::tr1::unordered_set, but it wasn't much better.

Garth

_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] Fwd: Assembly benchmark

Reply via email to