Re: [DOLFIN-dev] profiling an assembly

Murtazo Nazarov Fri, 16 May 2008 04:52:17 -0700

Anders Logg wrote:
> On Fri, May 16, 2008 at 12:17:19AM +0200, Murtazo Nazarov wrote:
>   
>>> Hello!
>>>
>>> I'm looking at a "suspiciously slow" assembly and would like to
>>> determine what is going on. In general, what should one expect the most
>>> time-consuming step to be?
>>>
>>> This is what my gprof looks like:
>>>
>>> Time:
>>> 61.97%  unsigned int const* std::lower_bound
>>> 25.84%  dolfin::uBlasMatrix<...>::add
>>> 8.27%   UFC_NSEMomentum3DBilinearForm_cell_integral_0::tabulate_tensor
>>> 1.1%    dolfin::uBlasMatrix<...>::init
>>>       
>
> Where is lower_bound used? From within uBlasMatrix::add or is it in
> building the sparsity pattern?
>
>   
>> I got these numbers also. I understand that it is very painful in large
>> computations.
>>
>> I see what is a problem with adding into the stiffness matrix A. Searching
>> the position of the element which needs to be added takes very long time,
>> especially if you are solving big problems with thousands unknowns and
>> repeating the assembling a lot of times!
>>     
>
> If you know a good way to avoid inserting entries into a sparse matrix
> during assembly, please tell me... :-)
>


Below I wrote it already.

> If the assembly is costly, you might want to try assembling the action
> of it instead and send that to a Krylov solver. Inserting into a
> vector is much easier than into a sparse matrix.
>
>   
Actually I don't believe on it. I checked assembling a vector almost
the same as sparse matrix, at least for Navier-Stokes. Sometimes
it is almost twice faster then assembling a sparse matrix.
>> One way could be finding the global indices of the matrix A once, and use
>> it in the assembly process. By this way we avoid of searching the element
>> position and it makes the process significantly fast. But, there is a
>> problem: somehow I cannot get access to the global index of cell in the A
>> and change it instead of using MatSetValues (in PETSc).
>>     
>
> I don't understand what you suggest here. We do precompute the
> sparsity pattern of the matrix and use that to preallocate, but I
> don't know of any other way to insert entries than MatSetValues.
>
>   

I mean instead of doing

    A(row,cell,val) = A(row,cell,val) +  val_new,

we can do something
   
    A(ind) = A(ind) + val_new,
 
where row and cell are row and cell number, val the value of the
matrix A which lies in row,cell. ind is the index of A which changes
from 0 to number of nonzero elements of A, or size of array A.
Here we need to compute and save the vector of indices of A, ind, which
is not so good, because it has a large size.


>> I am pretty sure that we may speed up the A.set() and A.get() processes as
>> well by the above method.
>>     

> Please explain.
>
>   

If we do the same way as we do the same way as above by using ind, then
we do not have to search an element place in the matrix A, but instead
get straight  a way that element.
>> I am not sure how the dofmap to get rows and cols indices of the cells is
>> implemented. We could avoid repeating this operation as well.
>>     
>
> This is already implemented (but maybe not used). DofMap handles this.
> It wraps the generated ufc::dof_map code and may pretabulate (and
> possibly reorder) the dofs.
>
>   
Ok, good. I don't think that it is used.
>> We did some comparison with another free fem toolbox, FemLego, the
>> assembly process in Dolfin is 3 times slower than FemLego in 2D. I believe
>> this number will increase in 3D. FemLego uses quadrature rule for
>> computing integrals.
>>     
>
> Can you benchmark the various parts of the assembly to see what causes
> the slowdown:
>
>   1. Is it tabulate_tensor?
>   2. Is it tabulate_dofs?
>   3. Is it A.add()?
>   4. Something else?
>
>   
Sure, here we go:

Dolfin::Assembler::assembleCells:

1. tabulate_tensor for bilinearform of Momentum in NSE: 6.04%
    tabulate_tensor for    linearform of Momentum in NSE: 11.98%

2. Dolfin::GeneriMatrix::add: 68.98%

3. Dolfin::Function::Interpolate: 9.05%

You see which one takes the most time.
>> I hope some PETSc guys will help us to do this improvements. Any other
>> ideas are welcome!
>>     
>
> We are currently experimenting with collecting and preprocessing
> batches of entries before inserting into the global sparse matrix in
> hope of speeding up the assembly but we don't have any results yet.
>
>   

_______________________________________________
DOLFIN-dev mailing list
[email protected]
http://www.fenics.org/mailman/listinfo/dolfin-dev

Re: [DOLFIN-dev] profiling an assembly

Reply via email to