'Real' spaces are a problem and shouldn't be used for large problems.
They lead to dense rows in the matrix. Sparsity pattern construction
and sparse matrix insertion are designed/optimised for sparse rows.
Garth
I've looked at the code (read: tried to do my homework).
The quadratic behaviour
(e.g. quadratic.py) comes, as pointed out by Garth,
from repeatedly inserting elements into a dense row.
The "row" is defined as
typedef dolfin::Set<std::size_t> set_type;
in la/SparsityPattern.h; each insertion into a dolfin::Set
(common/Set.h) means a linear serch into a growing vector,
whence the quadratic complexity for a full row.
It is claimed that dolfin::Set is faster than
e.g. std::set or boost:unordered_set.
To my shame I was skeptical, but this is indeed the case even for a not
so small number of elements.
Consider The attached SetBench.py:
changing the definition of set_type I get these timings on my laptot:
---------------
dolfin::Set
---------------
DOFs: 17576 T1/D: 1.21562411474e-05 T2/D: 1.21447380207e-05 T3/D:
3.06724498204e-05 T4/D: 5.89880066456e-05
DOFs: 19683 T1/D: 1.23116917129e-05 T2/D: 1.2326421022e-05 T3/D:
2.98475278003e-05 T4/D: 5.81933080511e-05
DOFs: 21952 T1/D: 1.21632248771e-05 T2/D: 1.22790672862e-05 T3/D:
3.00371677292e-05 T4/D: 5.87944538705e-05
DOFs: 24389 T1/D: 1.23355147085e-05 T2/D: 1.23399821852e-05 T3/D:
3.04241619152e-05 T4/D: 5.85226551419e-05
DOFs: 27000 T1/D: 1.22901863522e-05 T2/D: 1.31540033552e-05 T3/D:
3.04095921693e-05 T4/D: 5.90437429923e-05
DOFs: 29791 T1/D: 1.24349962576e-05 T2/D: 1.24345640934e-05 T3/D:
3.0518178353e-05 T4/D: 5.93754216898e-05
DOFs: 32768 T1/D: 1.29479667521e-05 T2/D: 1.24920406961e-05 T3/D:
3.03102715407e-05 T4/D: 5.95057354076e-05
DOFs: 35937 T1/D: 1.23870052001e-05 T2/D: 1.24276339499e-05 T3/D:
3.05265013236e-05 T4/D: 5.90534830119e-05
---------------
std::set
---------------
DOFs: 17576 T1/D: 2.18468386312e-05 T2/D: 2.20011405962e-05 T3/D:
5.21341100518e-05 T4/D: 9.6728897225e-05
DOFs: 19683 T1/D: 2.19288162055e-05 T2/D: 2.17934180001e-05 T3/D:
5.07684764911e-05 T4/D: 9.65355894087e-05
DOFs: 21952 T1/D: 2.19700678792e-05 T2/D: 2.21555178263e-05 T3/D:
5.10117576699e-05 T4/D: 9.64271083865e-05
DOFs: 24389 T1/D: 2.23399059101e-05 T2/D: 2.21942681229e-05 T3/D:
5.13570472623e-05 T4/D: 9.66586204046e-05
DOFs: 27000 T1/D: 2.249984388e-05 T2/D: 2.24757017913e-05 T3/D:
5.11607770567e-05 T4/D: 9.64084024783e-05
DOFs: 29791 T1/D: 2.25772174526e-05 T2/D: 2.249585054e-05 T3/D:
5.15919846353e-05 T4/D: 9.80704585646e-05
DOFs: 32768 T1/D: 2.27100827033e-05 T2/D: 2.27337659453e-05 T3/D:
5.18985034432e-05 T4/D: 9.68929452938e-05
DOFs: 35937 T1/D: 2.28391623933e-05 T2/D: 2.26581773684e-05 T3/D:
5.18217712278e-05 T4/D: 9.7381613186e-05
---------------
boost::unordered_set
---------------
DOFs: 17576 T1/D: 2.60616765003e-05 T2/D: 2.37092056978e-05 T3/D:
5.59041920064e-05 T4/D: 9.76867359988e-05
DOFs: 19683 T1/D: 2.38605965954e-05 T2/D: 2.38233493704e-05 T3/D:
5.42603765859e-05 T4/D: 9.59506686768e-05
DOFs: 21952 T1/D: 2.40979471811e-05 T2/D: 2.39046556609e-05 T3/D:
5.50718787982e-05 T4/D: 9.76070044861e-05
DOFs: 24389 T1/D: 2.43433492836e-05 T2/D: 2.41920122865e-05 T3/D:
5.46513959057e-05 T4/D: 9.77637099234e-05
DOFs: 27000 T1/D: 2.44686338637e-05 T2/D: 2.4351367244e-05 T3/D:
5.50628856376e-05 T4/D: 9.78941829116e-05
DOFs: 29791 T1/D: 2.4397501008e-05 T2/D: 2.42834016598e-05 T3/D:
5.53219056931e-05 T4/D: 9.91133907915e-05
DOFs: 32768 T1/D: 2.46483177762e-05 T2/D: 2.46077906922e-05 T3/D:
5.56596714887e-05 T4/D: 9.68612075667e-05
DOFs: 35937 T1/D: 2.47220369148e-05 T2/D: 2.45810172048e-05 T3/D:
5.56450135342e-05 T4/D: 9.79038889965e-05
However, with a dense row the effect is disastrous.
Would something along the lines of the attached version of Set.h be
acceptable? Please, don't scream in horror immediately!
It achives a good flexibility at the expense of a small run-time
penality and of some memory.
The run-time penalty is not zero, but is small: for SetBench.py I get
---------------
modified dolfin::Set
---------------
DOFs: 17576 T1/D: 1.30159370889e-05 T2/D: 1.2764062022e-05 T3/D:
3.11780312303e-05 T4/D: 6.05253289925e-05
DOFs: 19683 T1/D: 1.28605401729e-05 T2/D: 1.29255138689e-05 T3/D:
3.11989540287e-05 T4/D: 6.04168523141e-05
DOFs: 21952 T1/D: 1.29104442569e-05 T2/D: 1.29497498708e-05 T3/D:
3.13480246171e-05 T4/D: 6.04451050216e-05
DOFs: 24389 T1/D: 1.29966817173e-05 T2/D: 1.30719543017e-05 T3/D:
3.14079745923e-05 T4/D: 6.07134392695e-05
DOFs: 27000 T1/D: 1.30851533678e-05 T2/D: 1.39047834608e-05 T3/D:
3.15687391493e-05 T4/D: 6.10168068497e-05
DOFs: 29791 T1/D: 1.31030820743e-05 T2/D: 1.31048587493e-05 T3/D:
3.15673849637e-05 T4/D: 6.10689851021e-05
DOFs: 32768 T1/D: 1.37035822263e-05 T2/D: 1.31555498228e-05 T3/D:
3.17273588735e-05 T4/D: 6.12919029663e-05
DOFs: 35937 T1/D: 1.31892770513e-05 T2/D: 1.30803609533e-05 T3/D:
3.1701530675e-05 T4/D: 6.15608155782e-05
that is not so bad (at least, in my opinion) compared to the current
code. And, of course, quadratic.py is no longer quadratic (even with
reorder_dofs_serial = True).
I'm willing to perform any benchmark that would be required.
And, of course, I'm willing to tune the value of the thresold _max_size.
Thanks
Marco
#========================================
from dolfin import *
import time
import numpy as np
from petsc4py import PETSc
#import matplotlib.pyplot as plt
#set_log_level(DEBUG)
#parameters['reorder_dofs_serial'] = False
for n in range(25, 33):
mesh = UnitCubeMesh(n, n, n)
#mesh = UnitSquareMesh(10*n, 10*n)
P1 = FunctionSpace(mesh, 'CG', 1)
R = FunctionSpace(mesh, 'R', 0)
# Measure time for creation of P1*R space
t = time.time()
V = P1*R
t = time.time() - t
PP = P1*P1
(uu, ul) = TrialFunctions(V)
(vu, vl) = TestFunctions(V)
(pu, pp) = TrialFunctions(PP)
(pv, pq) = TestFunctions(PP)
u = TrialFunction(P1)
v = TestFunction(P1)
am = inner(grad(uu), grad(vu))*dx
a = inner(grad(pu), grad(pv))*dx
af = inner(grad(u), grad(v))*dx
#mixed cg-real
t1 = time.time()
AM = assemble(am)
t1 = time.time() - t1
# print AM
# plot(mesh)
# interactive()
# oo = PETSc.Viewer().createBinary('xxx',1)
# as_backend_type(AM).mat().view(oo)
# pippo
#mixed cg-cg
t2 = time.time()
A = assemble(a)
t2 = time.time() - t2
#cg
t3 = time.time()
AF = assemble(af)
t3 = time.time() - t3
# Report
print 'DOFs:', V.dim(),\
'T/D:', t/V.dim(),\
'T1/D:', t1/P1.dim(),\
'T2/D:', t2/P1.dim(),\
'T3/D:', t3/P1.dim()
# plt.loglog(V.dim(), t1, 'o')
# plt.loglog(V.dim(), t2, 'x')
# plt.loglog(V.dim(), t3, '*')
# #
# # Plot some quadratic function for comparison
# xl = plt.xlim()
# yl = plt.ylim()
# x = np.linspace(*xl)
# y = map(lambda x:yl[0]/(xl[0]*xl[0])*x*x, x)
# plt.loglog(x, y)
# #
# # # Show plot
# plt.show()
# #=============================================
#========================================
from dolfin import *
import time
import numpy as np
from petsc4py import PETSc
# import matplotlib.pyplot as plt
# set_log_level(DEBUG)
# parameters['reorder_dofs_serial'] = False
for n in range(25, 33):
mesh = UnitCubeMesh(n, n, n)
#mesh = UnitSquareMesh(10*n, 10*n)
P = FunctionSpace(mesh, 'CG', 1)
PP = P*P
PPP = P*P*P
PPPP = P*P*P*P
u = TrialFunction(P)
v = TestFunction(P)
(pu, pp) = TrialFunctions(PP)
(pv, pq) = TestFunctions(PP)
ppu = TrialFunctions(PPP)
ppv = TestFunctions(PPP)
pppu = TrialFunctions(PPPP)
pppv = TestFunctions(PPPP)
a = inner(grad(pu), grad(pv))*dx
pa = inner(grad(pu), grad(pv))*dx
ppa = inner(grad(ppu[0]), grad(ppv[0]))*dx
pppa = inner(grad(pppu[0]), grad(pppv[0]))*dx
# cg
t1 = time.time()
A = assemble(a)
t1 = time.time() - t1
#mixed cg-cg
t2 = time.time()
pA = assemble(pa)
t2 = time.time() - t2
# mixed cg-cg-cg
t3 = time.time()
ppA = assemble(ppa)
t3 = time.time() - t3
# mixed cg-cg-cg-cg
t4 = time.time()
ppA = assemble(pppa)
t4 = time.time() - t4
# Report
print 'DOFs:', P.dim(),\
'T1/D:', t1/P.dim(),\
'T2/D:', t2/P.dim(),\
'T3/D:', t3/P.dim(),\
'T4/D:', t4/P.dim()
# plt.loglog(V.dim(), t1, 'o')
# plt.loglog(V.dim(), t2, 'x')
# plt.loglog(V.dim(), t3, '*')
#
# # # Show plot
# plt.show()
# #=============================================
// Copyright (C) 2009-2011 Garth N. Wells
//
// This file is part of DOLFIN.
//
// DOLFIN is free software: you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// DOLFIN is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public License
// along with DOLFIN. If not, see <http://www.gnu.org/licenses/>.
//
// First added: 2009-08-09
// Last changed: 2013-10-06
#ifndef __DOLFIN_SET_H
#define __DOLFIN_SET_H
#include <algorithm>
#include <cstddef>
#include <vector>
#include <set>
namespace dolfin
{
/// This is a set-like data structure. It is not ordered and it is based
/// a std::vector. It uses linear search, and can be faster than std::set
// and boost::unordered_set in some cases.
template<typename T>
class Set
{
public:
typedef typename std::vector<T>::iterator iterator;
typedef typename std::vector<T>::const_iterator const_iterator;
/// Create empty set
Set() : _vec(true) {}
/// Wrap std::vectpr as a set. Contents will be erased.
Set(std::vector<T>& x) : _x(x), _vec(true)
{ _x.clear(); _xx.clear();}
/// Copy constructor
Set(const dolfin::Set<T>& x) : _x(x._x), _xx(x._xx), _vec(x._vec) {}
/// Destructor
~Set() {}
/// operator =
dolfin::Set<T>& operator = (const dolfin::Set<T>& x) {
this->_x = x._x;
this->_xx = x._xx;
this->_vec = x._vec;
return *this;
}
/// Insert entry
bool insert(const T& x)
{
if (_vec)
{
if( find(x) == this->end() )
{
_x.push_back(x);
if (_x.size() == _max_size) {
_vec = false;
std::insert_iterator<std::set<T> > i(_xx, _xx.begin());
std::copy(_x.begin(), _x.end(), i);
}
return true;
}
else
return false;
}
else
{
if( _xx.find(x) == _xx.end() )
{
_xx.insert(x);
_x.push_back(x);
return true;
}
else
return false;
}
}
/// Insert entries
template <typename InputIt>
void insert(const InputIt first, const InputIt last)
{
for (InputIt position = first; position != last; ++position)
{
insert(*position);
}
}
const_iterator begin() const
{ return _x.begin(); }
const_iterator end() const
{ return _x.end(); }
/// Set size
std::size_t size() const
{ return _x.size(); }
/// Erase an entry
void erase(const T& x)
{
iterator p = find(x);
if (p != _x.end()) {
_x.erase(p);
if (!_vec)
_xx.erase(x);
}
}
/// Sort set
void sort()
{ if (_vec)
std::sort(_x.begin(), _x.end());
else
std::copy(_xx.begin(), _xx.end(), _x.begin());
}
/// Clear set
void clear()
{ _x.clear();
if (!_vec)
{
_xx.clear();
_vec = true;
}
}
/// Index the nth entry in the set
T operator[](std::size_t n) const
{ return _x[n]; }
/// Return the vector that stores the data in the Set
const std::vector<T>& set() const
{ return _x; }
/// Return the vector that stores the data in the Set
std::vector<T>& set()
{ return _x; }
private:
/// Find entry in set and return an iterator to the entry
iterator find(const T& x)
{ return std::find(_x.begin(), _x.end(), x); }
/// Find entry in set and return an iterator to the entry (const)
const_iterator find(const T& x) const
{ return std::find(_x.begin(), _x.end(), x); }
std::vector<T> _x;
std::set<T> _xx;
bool _vec;
static const int _max_size = 100;
};
}
#endif
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics