Re: [DOLFIN-dev] Fwd: Assembly benchmark

Matthew Knepley Wed, 06 Aug 2008 18:49:18 -0700

I actually had a set, and switched to a preallocated array of a known length,
eliminating almost all overhead. Something is wrong in setland.


  Matt

On Wed, Aug 6, 2008 at 10:08 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote:
>
>
> Anders Logg wrote:
>> On Wed, Aug 06, 2008 at 01:44:24PM +0100, Garth N. Wells wrote:
>>>
>>> Anders Logg wrote:
>>>> On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:
>>>>> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote:
>>>>>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Matthew Knepley <[EMAIL PROTECTED]>
>>>>>>> Date: Wed, Aug 6, 2008 at 4:24 AM
>>>>>>> Subject: Re: [DOLFIN-dev] Assembly benchmark
>>>>>>> To: "Garth N. Wells" <[EMAIL PROTECTED]>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> 
>>>>>>> wrote:
>>>>>>>>> ok, here's the page, let's see some numbers:
>>>>>>>>>
>>>>>>>>>   http://www.fenics.org/wiki/Benchmark
>>>>>>>>>
>>>>>>>> I just added my results.
>>>>>>>>
>>>>>>>> The most obvious difference in our systems is 32/64 bit which could
>>>>>>>> likely account for the differences. MTL4 seems considerably faster on
>>>>>>>> the 32 bit system.
>>>>>>> I need to understand the categories into which the time is divided:
>>>>>>>
>>>>>>>  1) They do not add to the total (or even close)
>>>>>> There are 8 tables:
>>>>>>
>>>>>>  0 Assemble total
>>>>>>  1 Init dof map
>>>>>>  2 Build sparsity
>>>>>>  3 Init tensor
>>>>>>  4 Delete sparsity
>>>>>>  5 Assemble cells
>>>>>>  6 Overhead
>>>>>>
>>>>>>  7 Reassemble total
>>>>>>
>>>>>> The first is the total and includes 1-6 so tables 1-6 should
>>>>>> add up to table 0. In fact, table 6 ("Overhead") is computed as the
>>>>>> difference of table 0 and tables 1-5.
>>>>>>
>>>>>> Then table 7 reports the total for reassembling into a matrix which
>>>>>> has already been initialized with the correct sparsity pattern (and
>>>>>> used before).
>>>>>>
>>>>>> Maybe there's a better way to order/present the tables to make this
>>>>>> clear?
>>>>>>
>>>>>>>  2) I am not sure what is going on within each unit
>>>>>>  1 Init dof map
>>>>>>
>>>>>> This one does some initialization for computing the dof map. The only
>>>>>> thing that may happen here (for FFC forms) is that we may generate
>>>>>> the edges and faces if those are needed. You can see the difference
>>>>>> for P1, P2 and P3.
>>>>> Don't understand why this is different for any of the backends.
>>>> It's the same, or should be. The benchmark just runs each test case
>>>> once so there may be small "random" fluctuations in the numbers.
>>>>
>>>> The numbers of Table 1 are essentially the same for all backends.
>>>>
>>>>>>  2 Build sparsity
>>>>>>
>>>>>> This one computes the sparsity pattern by iterating over all cells,
>>>>>> computing the local-to-global mapping on each cell and counting the
>>>>>> number of nonzeros.
>>>>> Same question.
>>>> This should be the same for all backends except for Epetra. The DOLFIN
>>>> LA interface allows for overloading the handling of the sparsity
>>>> pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity
>>>> pattern. It seems to perform worse than the DOLFIN built-in sparsity
>>>> pattern (used for all other backends) which is just a simple
>>>>
>>> MTL4 isn't using a sparsity pattern. A guess is just being made as to
>>> the number of non-zeroes per row.
>>>
>>>>   std::vector< std::set<uint> >
>>>>
>>> It's now a
>>>
>>>      std::vector< std::vector<uint> >
>>>
>>> which is faster than using std::set. Only uBLAS needs the terms to be
>>> ordered (std::set is ordered), so I added SparsityPattern::sort() to do
>>> this.
>>>
>>> Garth
>>
>> I didn't know. I'm surprised that doing a linear search is faster than
>> using std::set. I thought the std::set was optimized for this.
>>
>
> std::set was dead slow, which I attributed to it being ordered and
> therefore requiring shuffling after insertions. I tried
> std::tr1::unordered_set, but it wasn't much better.
>
> Garth
>
> _______________________________________________
> DOLFIN-dev mailing list
> [email protected]
> http://www.fenics.org/mailman/listinfo/dolfin-dev
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener
_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] Fwd: Assembly benchmark

Reply via email to