On 2013-10-07 14:25, Marco Morandini wrote:

'Real' spaces are a problem and shouldn't be used for large problems.
They lead to dense rows in the matrix. Sparsity pattern construction
and sparse matrix insertion are designed/optimised for sparse rows.

Garth

I've looked at the code (read: tried to do my homework).

The quadratic behaviour
(e.g. quadratic.py) comes, as pointed out by Garth,
from repeatedly inserting elements into a dense row.
The "row" is defined as

    typedef dolfin::Set<std::size_t> set_type;

in la/SparsityPattern.h; each insertion into a dolfin::Set
(common/Set.h) means a linear serch into a growing vector,
whence the quadratic complexity for a full row.

It is claimed that dolfin::Set is faster than
e.g. std::set or boost:unordered_set.
To my shame I was skeptical, but this is indeed the case even for a
not so small number of elements.

Consider The attached SetBench.py:
changing the definition of set_type I get these timings on my laptot:

---------------
dolfin::Set
---------------
DOFs: 17576 T1/D: 1.21562411474e-05 T2/D: 1.21447380207e-05 T3/D:
3.06724498204e-05 T4/D: 5.89880066456e-05
DOFs: 19683 T1/D: 1.23116917129e-05 T2/D: 1.2326421022e-05 T3/D:
2.98475278003e-05 T4/D: 5.81933080511e-05
DOFs: 21952 T1/D: 1.21632248771e-05 T2/D: 1.22790672862e-05 T3/D:
3.00371677292e-05 T4/D: 5.87944538705e-05
DOFs: 24389 T1/D: 1.23355147085e-05 T2/D: 1.23399821852e-05 T3/D:
3.04241619152e-05 T4/D: 5.85226551419e-05
DOFs: 27000 T1/D: 1.22901863522e-05 T2/D: 1.31540033552e-05 T3/D:
3.04095921693e-05 T4/D: 5.90437429923e-05
DOFs: 29791 T1/D: 1.24349962576e-05 T2/D: 1.24345640934e-05 T3/D:
3.0518178353e-05 T4/D: 5.93754216898e-05
DOFs: 32768 T1/D: 1.29479667521e-05 T2/D: 1.24920406961e-05 T3/D:
3.03102715407e-05 T4/D: 5.95057354076e-05
DOFs: 35937 T1/D: 1.23870052001e-05 T2/D: 1.24276339499e-05 T3/D:
3.05265013236e-05 T4/D: 5.90534830119e-05

---------------
std::set
---------------
DOFs: 17576 T1/D: 2.18468386312e-05 T2/D: 2.20011405962e-05 T3/D:
5.21341100518e-05 T4/D: 9.6728897225e-05
DOFs: 19683 T1/D: 2.19288162055e-05 T2/D: 2.17934180001e-05 T3/D:
5.07684764911e-05 T4/D: 9.65355894087e-05
DOFs: 21952 T1/D: 2.19700678792e-05 T2/D: 2.21555178263e-05 T3/D:
5.10117576699e-05 T4/D: 9.64271083865e-05
DOFs: 24389 T1/D: 2.23399059101e-05 T2/D: 2.21942681229e-05 T3/D:
5.13570472623e-05 T4/D: 9.66586204046e-05
DOFs: 27000 T1/D: 2.249984388e-05 T2/D: 2.24757017913e-05 T3/D:
5.11607770567e-05 T4/D: 9.64084024783e-05
DOFs: 29791 T1/D: 2.25772174526e-05 T2/D: 2.249585054e-05 T3/D:
5.15919846353e-05 T4/D: 9.80704585646e-05
DOFs: 32768 T1/D: 2.27100827033e-05 T2/D: 2.27337659453e-05 T3/D:
5.18985034432e-05 T4/D: 9.68929452938e-05
DOFs: 35937 T1/D: 2.28391623933e-05 T2/D: 2.26581773684e-05 T3/D:
5.18217712278e-05 T4/D: 9.7381613186e-05

---------------
boost::unordered_set
---------------
DOFs: 17576 T1/D: 2.60616765003e-05 T2/D: 2.37092056978e-05 T3/D:
5.59041920064e-05 T4/D: 9.76867359988e-05
DOFs: 19683 T1/D: 2.38605965954e-05 T2/D: 2.38233493704e-05 T3/D:
5.42603765859e-05 T4/D: 9.59506686768e-05
DOFs: 21952 T1/D: 2.40979471811e-05 T2/D: 2.39046556609e-05 T3/D:
5.50718787982e-05 T4/D: 9.76070044861e-05
DOFs: 24389 T1/D: 2.43433492836e-05 T2/D: 2.41920122865e-05 T3/D:
5.46513959057e-05 T4/D: 9.77637099234e-05
DOFs: 27000 T1/D: 2.44686338637e-05 T2/D: 2.4351367244e-05 T3/D:
5.50628856376e-05 T4/D: 9.78941829116e-05
DOFs: 29791 T1/D: 2.4397501008e-05 T2/D: 2.42834016598e-05 T3/D:
5.53219056931e-05 T4/D: 9.91133907915e-05
DOFs: 32768 T1/D: 2.46483177762e-05 T2/D: 2.46077906922e-05 T3/D:
5.56596714887e-05 T4/D: 9.68612075667e-05
DOFs: 35937 T1/D: 2.47220369148e-05 T2/D: 2.45810172048e-05 T3/D:
5.56450135342e-05 T4/D: 9.79038889965e-05


However, with a dense row the effect is disastrous.

Would something along the lines of the attached version of Set.h be
acceptable? Please, don't scream in horror immediately!

It achives a good flexibility at the expense of a small run-time
penality and of some memory.


A 'fix' for this needs to go deeper. With global dofs, building the sparsity pattern is only one of the difficulties. The other is insertion into a sparse matrix. PETSc, in particular, doesn't perform well for dense rows.

A fix that would work is to mark global dofs in the dofmap, and add the whole row to the sparsity pattern in one go. To assemble a dense row, the row entries could be cached in a vector and the row added in one go to the matrix.

Garth


The run-time penalty is not zero, but is small: for SetBench.py I get

---------------
modified dolfin::Set
---------------
DOFs: 17576 T1/D: 1.30159370889e-05 T2/D: 1.2764062022e-05 T3/D:
3.11780312303e-05 T4/D: 6.05253289925e-05
DOFs: 19683 T1/D: 1.28605401729e-05 T2/D: 1.29255138689e-05 T3/D:
3.11989540287e-05 T4/D: 6.04168523141e-05
DOFs: 21952 T1/D: 1.29104442569e-05 T2/D: 1.29497498708e-05 T3/D:
3.13480246171e-05 T4/D: 6.04451050216e-05
DOFs: 24389 T1/D: 1.29966817173e-05 T2/D: 1.30719543017e-05 T3/D:
3.14079745923e-05 T4/D: 6.07134392695e-05
DOFs: 27000 T1/D: 1.30851533678e-05 T2/D: 1.39047834608e-05 T3/D:
3.15687391493e-05 T4/D: 6.10168068497e-05
DOFs: 29791 T1/D: 1.31030820743e-05 T2/D: 1.31048587493e-05 T3/D:
3.15673849637e-05 T4/D: 6.10689851021e-05
DOFs: 32768 T1/D: 1.37035822263e-05 T2/D: 1.31555498228e-05 T3/D:
3.17273588735e-05 T4/D: 6.12919029663e-05
DOFs: 35937 T1/D: 1.31892770513e-05 T2/D: 1.30803609533e-05 T3/D:
3.1701530675e-05 T4/D: 6.15608155782e-05

that is not so bad (at least, in my opinion) compared to the current
code. And, of course, quadratic.py is no longer quadratic (even with
reorder_dofs_serial = True).


I'm willing to perform any benchmark that would be required.
And, of course, I'm willing to tune the value of the thresold _max_size.

Thanks

Marco
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to