Anders Logg wrote:
> On Wed, Aug 06, 2008 at 06:10:33AM -0500, Matthew Knepley wrote:
>> On Wed, Aug 6, 2008 at 5:00 AM, Anders Logg <[EMAIL PROTECTED]> wrote:
>>> On Wed, Aug 06, 2008 at 04:24:36AM -0500, Matthew Knepley wrote:
>>>> ---------- Forwarded message ----------
>>>> From: Matthew Knepley <[EMAIL PROTECTED]>
>>>> Date: Wed, Aug 6, 2008 at 4:24 AM
>>>> Subject: Re: [DOLFIN-dev] Assembly benchmark
>>>> To: "Garth N. Wells" <[EMAIL PROTECTED]>
>>>>
>>>>
>>>> On Wed, Aug 6, 2008 at 4:20 AM, Garth N. Wells <[EMAIL PROTECTED]> wrote:
>>>>>> ok, here's the page, let's see some numbers:
>>>>>>
>>>>>> http://www.fenics.org/wiki/Benchmark
>>>>>>
>>>>> I just added my results.
>>>>>
>>>>> The most obvious difference in our systems is 32/64 bit which could
>>>>> likely account for the differences. MTL4 seems considerably faster on
>>>>> the 32 bit system.
>>>> I need to understand the categories into which the time is divided:
>>>>
>>>> 1) They do not add to the total (or even close)
>>> There are 8 tables:
>>>
>>> 0 Assemble total
>>> 1 Init dof map
>>> 2 Build sparsity
>>> 3 Init tensor
>>> 4 Delete sparsity
>>> 5 Assemble cells
>>> 6 Overhead
>>>
>>> 7 Reassemble total
>>>
>>> The first is the total and includes 1-6 so tables 1-6 should
>>> add up to table 0. In fact, table 6 ("Overhead") is computed as the
>>> difference of table 0 and tables 1-5.
>>>
>>> Then table 7 reports the total for reassembling into a matrix which
>>> has already been initialized with the correct sparsity pattern (and
>>> used before).
>>>
>>> Maybe there's a better way to order/present the tables to make this
>>> clear?
>>>
>>>> 2) I am not sure what is going on within each unit
>>> 1 Init dof map
>>>
>>> This one does some initialization for computing the dof map. The only
>>> thing that may happen here (for FFC forms) is that we may generate
>>> the edges and faces if those are needed. You can see the difference
>>> for P1, P2 and P3.
>> Don't understand why this is different for any of the backends.
>
> It's the same, or should be. The benchmark just runs each test case
> once so there may be small "random" fluctuations in the numbers.
>
> The numbers of Table 1 are essentially the same for all backends.
>
>>> 2 Build sparsity
>>>
>>> This one computes the sparsity pattern by iterating over all cells,
>>> computing the local-to-global mapping on each cell and counting the
>>> number of nonzeros.
>> Same question.
>
> This should be the same for all backends except for Epetra. The DOLFIN
> LA interface allows for overloading the handling of the sparsity
> pattern. For Epetra, we use a Epetra_FECrsGraph to hold the sparsity
> pattern. It seems to perform worse than the DOLFIN built-in sparsity
> pattern (used for all other backends) which is just a simple
>
MTL4 isn't using a sparsity pattern. A guess is just being made as to
the number of non-zeroes per row.
> std::vector< std::set<uint> >
>
It's now a
std::vector< std::vector<uint> >
which is faster than using std::set. Only uBLAS needs the terms to be
ordered (std::set is ordered), so I added SparsityPattern::sort() to do
this.
Garth
>>> 3 Init tensor
>>>
>>> This one initializes the matrix from the sparsity pattern by looking
>>> at the number of nonzeros per row (calling MatSeqAIJSetPreallocation)
>>> in PETSc.
>> Okay.
>>
>>> 4 Delete sparsity
>>>
>>> This one deletes the sparsity pattern. This shouldn't take any time
>>> but we found in some tests it actually does (due to some STL
>>> peculiarities).
>> This is nonzero for some PETSc runs, which makes no sense.
>
> The same data structure (the STL vector of sets) is used for all
> backends (including PETSc but not Epetra) so this will show up for
> PETSc.
>
>>> 5 Assemble cells
>>>
>>> This one does the actual assembly loop over cells and inserts
>>> (MatSetValues in PETSc).
>> It would be nice to time calculation vs. insertion time.
>
> I'll see if I can insert it. I'm a little worried it will hurt
> performance. All other timings are global and this would have to be
> done inside the loop.
>
>>> 6. Overhead
>>>
>>> Everything else not specifically accounted for.
>>>
>>>> 3) This is still much more expensive than my PETSc example (which can be
>>>> easily run. Its ex2 in KSP).
>>> Do we use the same mesh? In 2D it's a 256x256 unit square and in 3D
>>> it's a 32x32x32 unit cube.
>> Okay, I will switch to this.
>
> Nice.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> DOLFIN-dev mailing list
> [email protected]
> http://www.fenics.org/mailman/listinfo/dolfin-dev
_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev