Re: [DOLFIN-dev] Fwd: Assembly benchmark

Garth N. Wells Wed, 06 Aug 2008 05:45:10 -0700


Anders Logg wrote:
> On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:
>> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote:
>>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:
>>>> ---------- Forwarded message ----------
>>>> From: Matthew Knepley <[EMAIL PROTECTED]>
>>>> Date: Wed, Aug 6, 2008 at 4:24 AM
>>>> Subject: Re: [DOLFIN-dev] Assembly benchmark
>>>> To: "Garth N. Wells" <[EMAIL PROTECTED]>
>>>>
>>>>
>>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote:
>>>>>> ok, here's the page, let's see some numbers:
>>>>>>
>>>>>>   http://www.fenics.org/wiki/Benchmark
>>>>>>
>>>>> I just added my results.
>>>>>
>>>>> The most obvious difference in our systems is 32/64 bit which could
>>>>> likely account for the differences. MTL4 seems considerably faster on
>>>>> the 32 bit system.
>>>> I need to understand the categories into which the time is divided:
>>>>
>>>>  1) They do not add to the total (or even close)
>>> There are 8 tables:
>>>
>>>  0 Assemble total
>>>  1 Init dof map
>>>  2 Build sparsity
>>>  3 Init tensor
>>>  4 Delete sparsity
>>>  5 Assemble cells
>>>  6 Overhead
>>>
>>>  7 Reassemble total
>>>
>>> The first is the total and includes 1-6 so tables 1-6 should
>>> add up to table 0. In fact, table 6 ("Overhead") is computed as the
>>> difference of table 0 and tables 1-5.
>>>
>>> Then table 7 reports the total for reassembling into a matrix which
>>> has already been initialized with the correct sparsity pattern (and
>>> used before).
>>>
>>> Maybe there's a better way to order/present the tables to make this
>>> clear?
>>>
>>>>  2) I am not sure what is going on within each unit
>>>  1 Init dof map
>>>
>>> This one does some initialization for computing the dof map. The only
>>> thing that may happen here (for FFC forms) is that we may generate
>>> the edges and faces if those are needed. You can see the difference
>>> for P1, P2 and P3.
>> Don't understand why this is different for any of the backends.
> 
> It's the same, or should be. The benchmark just runs each test case
> once so there may be small "random" fluctuations in the numbers.
> 
> The numbers of Table 1 are essentially the same for all backends.
> 
>>>  2 Build sparsity
>>>
>>> This one computes the sparsity pattern by iterating over all cells,
>>> computing the local-to-global mapping on each cell and counting the
>>> number of nonzeros.
>> Same question.
> 
> This should be the same for all backends except for Epetra. The DOLFIN
> LA interface allows for overloading the handling of the sparsity
> pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity
> pattern. It seems to perform worse than the DOLFIN built-in sparsity
> pattern (used for all other backends) which is just a simple
>


MTL4 isn't using a sparsity pattern. A guess is just being made as to 
the number of non-zeroes per row.

>   std::vector< std::set<uint> >
> 

It's now a

     std::vector< std::vector<uint> >

which is faster than using std::set. Only uBLAS needs the terms to be 
ordered (std::set is ordered), so I added SparsityPattern::sort() to do 
this.

Garth

>>>  3 Init tensor
>>>
>>> This one initializes the matrix from the sparsity pattern by looking
>>> at the number of nonzeros per row (calling MatSeqAIJSetPreallocation)
>>> in PETSc.
>> Okay.
>>
>>>  4 Delete sparsity
>>>
>>> This one deletes the sparsity pattern. This shouldn't take any time
>>> but we found in some tests it actually does (due to some STL
>>> peculiarities).
>> This is nonzero for some PETSc runs, which makes no sense.
> 
> The same data structure (the STL vector of sets) is used for all
> backends (including PETSc but not Epetra) so this will show up for
> PETSc.
> 
>>>  5 Assemble cells
>>>
>>> This one does the actual assembly loop over cells and inserts
>>> (MatSetValues in PETSc).
>> It would be nice to time calculation vs. insertion time.
> 
> I'll see if I can insert it. I'm a little worried it will hurt
> performance. All other timings are global and this would have to be
> done inside the loop.
> 
>>>  6. Overhead
>>>
>>> Everything else not specifically accounted for.
>>>
>>>>  3) This is still much more expensive than my PETSc example (which can be
>>>>      easily run. Its ex2 in KSP).
>>> Do we use the same mesh? In 2D it's a 256x256 unit square and in 3D
>>> it's a 32x32x32 unit cube.
>> Okay, I will switch to this.
> 
> Nice.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> DOLFIN-dev mailing list
> [email protected]
> http://www.fenics.org/mailman/listinfo/dolfin-dev
_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] Fwd: Assembly benchmark

Reply via email to