'Real' spaces are a problem and shouldn't be used for large problems.
They lead to dense rows in the matrix. Sparsity pattern construction
and sparse matrix insertion are designed/optimised for sparse rows.

Garth

I've looked at the code (read: tried to do my homework).

The quadratic behaviour
(e.g. quadratic.py) comes, as pointed out by Garth,
from repeatedly inserting elements into a dense row.
The "row" is defined as

    typedef dolfin::Set<std::size_t> set_type;

in la/SparsityPattern.h; each insertion into a dolfin::Set
(common/Set.h) means a linear serch into a growing vector,
whence the quadratic complexity for a full row.

It is claimed that dolfin::Set is faster than
e.g. std::set or boost:unordered_set.
To my shame I was skeptical, but this is indeed the case even for a not so small number of elements.

Consider The attached SetBench.py:
changing the definition of set_type I get these timings on my laptot:

---------------
dolfin::Set
---------------
DOFs: 17576 T1/D: 1.21562411474e-05 T2/D: 1.21447380207e-05 T3/D: 3.06724498204e-05 T4/D: 5.89880066456e-05 DOFs: 19683 T1/D: 1.23116917129e-05 T2/D: 1.2326421022e-05 T3/D: 2.98475278003e-05 T4/D: 5.81933080511e-05 DOFs: 21952 T1/D: 1.21632248771e-05 T2/D: 1.22790672862e-05 T3/D: 3.00371677292e-05 T4/D: 5.87944538705e-05 DOFs: 24389 T1/D: 1.23355147085e-05 T2/D: 1.23399821852e-05 T3/D: 3.04241619152e-05 T4/D: 5.85226551419e-05 DOFs: 27000 T1/D: 1.22901863522e-05 T2/D: 1.31540033552e-05 T3/D: 3.04095921693e-05 T4/D: 5.90437429923e-05 DOFs: 29791 T1/D: 1.24349962576e-05 T2/D: 1.24345640934e-05 T3/D: 3.0518178353e-05 T4/D: 5.93754216898e-05 DOFs: 32768 T1/D: 1.29479667521e-05 T2/D: 1.24920406961e-05 T3/D: 3.03102715407e-05 T4/D: 5.95057354076e-05 DOFs: 35937 T1/D: 1.23870052001e-05 T2/D: 1.24276339499e-05 T3/D: 3.05265013236e-05 T4/D: 5.90534830119e-05

---------------
std::set
---------------
DOFs: 17576 T1/D: 2.18468386312e-05 T2/D: 2.20011405962e-05 T3/D: 5.21341100518e-05 T4/D: 9.6728897225e-05 DOFs: 19683 T1/D: 2.19288162055e-05 T2/D: 2.17934180001e-05 T3/D: 5.07684764911e-05 T4/D: 9.65355894087e-05 DOFs: 21952 T1/D: 2.19700678792e-05 T2/D: 2.21555178263e-05 T3/D: 5.10117576699e-05 T4/D: 9.64271083865e-05 DOFs: 24389 T1/D: 2.23399059101e-05 T2/D: 2.21942681229e-05 T3/D: 5.13570472623e-05 T4/D: 9.66586204046e-05 DOFs: 27000 T1/D: 2.249984388e-05 T2/D: 2.24757017913e-05 T3/D: 5.11607770567e-05 T4/D: 9.64084024783e-05 DOFs: 29791 T1/D: 2.25772174526e-05 T2/D: 2.249585054e-05 T3/D: 5.15919846353e-05 T4/D: 9.80704585646e-05 DOFs: 32768 T1/D: 2.27100827033e-05 T2/D: 2.27337659453e-05 T3/D: 5.18985034432e-05 T4/D: 9.68929452938e-05 DOFs: 35937 T1/D: 2.28391623933e-05 T2/D: 2.26581773684e-05 T3/D: 5.18217712278e-05 T4/D: 9.7381613186e-05

---------------
boost::unordered_set
---------------
DOFs: 17576 T1/D: 2.60616765003e-05 T2/D: 2.37092056978e-05 T3/D: 5.59041920064e-05 T4/D: 9.76867359988e-05 DOFs: 19683 T1/D: 2.38605965954e-05 T2/D: 2.38233493704e-05 T3/D: 5.42603765859e-05 T4/D: 9.59506686768e-05 DOFs: 21952 T1/D: 2.40979471811e-05 T2/D: 2.39046556609e-05 T3/D: 5.50718787982e-05 T4/D: 9.76070044861e-05 DOFs: 24389 T1/D: 2.43433492836e-05 T2/D: 2.41920122865e-05 T3/D: 5.46513959057e-05 T4/D: 9.77637099234e-05 DOFs: 27000 T1/D: 2.44686338637e-05 T2/D: 2.4351367244e-05 T3/D: 5.50628856376e-05 T4/D: 9.78941829116e-05 DOFs: 29791 T1/D: 2.4397501008e-05 T2/D: 2.42834016598e-05 T3/D: 5.53219056931e-05 T4/D: 9.91133907915e-05 DOFs: 32768 T1/D: 2.46483177762e-05 T2/D: 2.46077906922e-05 T3/D: 5.56596714887e-05 T4/D: 9.68612075667e-05 DOFs: 35937 T1/D: 2.47220369148e-05 T2/D: 2.45810172048e-05 T3/D: 5.56450135342e-05 T4/D: 9.79038889965e-05


However, with a dense row the effect is disastrous.

Would something along the lines of the attached version of Set.h be acceptable? Please, don't scream in horror immediately!

It achives a good flexibility at the expense of a small run-time penality and of some memory.

The run-time penalty is not zero, but is small: for SetBench.py I get

---------------
modified dolfin::Set
---------------
DOFs: 17576 T1/D: 1.30159370889e-05 T2/D: 1.2764062022e-05 T3/D: 3.11780312303e-05 T4/D: 6.05253289925e-05 DOFs: 19683 T1/D: 1.28605401729e-05 T2/D: 1.29255138689e-05 T3/D: 3.11989540287e-05 T4/D: 6.04168523141e-05 DOFs: 21952 T1/D: 1.29104442569e-05 T2/D: 1.29497498708e-05 T3/D: 3.13480246171e-05 T4/D: 6.04451050216e-05 DOFs: 24389 T1/D: 1.29966817173e-05 T2/D: 1.30719543017e-05 T3/D: 3.14079745923e-05 T4/D: 6.07134392695e-05 DOFs: 27000 T1/D: 1.30851533678e-05 T2/D: 1.39047834608e-05 T3/D: 3.15687391493e-05 T4/D: 6.10168068497e-05 DOFs: 29791 T1/D: 1.31030820743e-05 T2/D: 1.31048587493e-05 T3/D: 3.15673849637e-05 T4/D: 6.10689851021e-05 DOFs: 32768 T1/D: 1.37035822263e-05 T2/D: 1.31555498228e-05 T3/D: 3.17273588735e-05 T4/D: 6.12919029663e-05 DOFs: 35937 T1/D: 1.31892770513e-05 T2/D: 1.30803609533e-05 T3/D: 3.1701530675e-05 T4/D: 6.15608155782e-05

that is not so bad (at least, in my opinion) compared to the current code. And, of course, quadratic.py is no longer quadratic (even with reorder_dofs_serial = True).


I'm willing to perform any benchmark that would be required.
And, of course, I'm willing to tune the value of the thresold _max_size.

Thanks

Marco

#========================================
from dolfin import *
import time
import numpy as np
from petsc4py import PETSc
#import matplotlib.pyplot as plt

#set_log_level(DEBUG)
#parameters['reorder_dofs_serial'] = False

for n in range(25, 33):

    mesh = UnitCubeMesh(n, n, n)
    #mesh = UnitSquareMesh(10*n, 10*n)
    P1 = FunctionSpace(mesh, 'CG', 1)
    R  = FunctionSpace(mesh, 'R', 0)

    # Measure time for creation of P1*R space
    t = time.time()
    V = P1*R
    t = time.time() - t
    PP = P1*P1

    (uu, ul) = TrialFunctions(V)
    (vu, vl) = TestFunctions(V)
    (pu, pp) = TrialFunctions(PP)
    (pv, pq) = TestFunctions(PP)
    u = TrialFunction(P1)
    v = TestFunction(P1)
    
    am = inner(grad(uu), grad(vu))*dx
    a = inner(grad(pu), grad(pv))*dx
    af = inner(grad(u), grad(v))*dx

    #mixed cg-real
    t1 = time.time()
    AM = assemble(am)
    t1 = time.time() - t1
#     print AM
#     plot(mesh)
#     interactive()
#     oo = PETSc.Viewer().createBinary('xxx',1)
#     as_backend_type(AM).mat().view(oo)
#     pippo

    #mixed cg-cg
    t2 = time.time()
    A = assemble(a)
    t2 = time.time() - t2
    
    #cg
    t3 = time.time()
    AF = assemble(af)
    t3 = time.time() - t3
   
    # Report
    print 'DOFs:', V.dim(),\
         'T/D:', t/V.dim(),\
         'T1/D:', t1/P1.dim(),\
	 'T2/D:', t2/P1.dim(),\
	 'T3/D:', t3/P1.dim()
#     plt.loglog(V.dim(), t1, 'o')
#     plt.loglog(V.dim(), t2, 'x')
#     plt.loglog(V.dim(), t3, '*')
# # 
# # Plot some quadratic function for comparison
# xl = plt.xlim()
# yl = plt.ylim()
# x  = np.linspace(*xl)
# y  = map(lambda x:yl[0]/(xl[0]*xl[0])*x*x, x)
# plt.loglog(x, y)
# # 
# # # Show plot
# plt.show()
# #=============================================
#========================================
from dolfin import *
import time
import numpy as np
from petsc4py import PETSc
# import matplotlib.pyplot as plt

# set_log_level(DEBUG)
# parameters['reorder_dofs_serial'] = False

for n in range(25, 33):

    mesh = UnitCubeMesh(n, n, n)
    #mesh = UnitSquareMesh(10*n, 10*n)
    P = FunctionSpace(mesh, 'CG', 1)
    PP = P*P
    PPP = P*P*P
    PPPP = P*P*P*P

    u = TrialFunction(P)
    v = TestFunction(P)
    (pu, pp) = TrialFunctions(PP)
    (pv, pq) = TestFunctions(PP)
    ppu = TrialFunctions(PPP)
    ppv = TestFunctions(PPP)
    pppu = TrialFunctions(PPPP)
    pppv = TestFunctions(PPPP)
    
    a = inner(grad(pu), grad(pv))*dx
    pa = inner(grad(pu), grad(pv))*dx
    ppa = inner(grad(ppu[0]), grad(ppv[0]))*dx
    pppa = inner(grad(pppu[0]), grad(pppv[0]))*dx

    # cg
    t1 = time.time()
    A = assemble(a)
    t1 = time.time() - t1

    #mixed cg-cg
    t2 = time.time()
    pA = assemble(pa)
    t2 = time.time() - t2
    
    # mixed cg-cg-cg
    t3 = time.time()
    ppA = assemble(ppa)
    t3 = time.time() - t3
    
    # mixed cg-cg-cg-cg
    t4 = time.time()
    ppA = assemble(pppa)
    t4 = time.time() - t4
   
    # Report
    print 'DOFs:', P.dim(),\
         'T1/D:', t1/P.dim(),\
	 'T2/D:', t2/P.dim(),\
	 'T3/D:', t3/P.dim(),\
         'T4/D:', t4/P.dim()
#     plt.loglog(V.dim(), t1, 'o')
#     plt.loglog(V.dim(), t2, 'x')
#     plt.loglog(V.dim(), t3, '*')
# 
# # # Show plot
# plt.show()
# #=============================================
// Copyright (C) 2009-2011 Garth N. Wells
//
// This file is part of DOLFIN.
//
// DOLFIN is free software: you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// DOLFIN is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public License
// along with DOLFIN. If not, see <http://www.gnu.org/licenses/>.
//
// First added:  2009-08-09
// Last changed: 2013-10-06

#ifndef __DOLFIN_SET_H
#define __DOLFIN_SET_H

#include <algorithm>
#include <cstddef>
#include <vector>
#include <set>

namespace dolfin
{

  /// This is a set-like data structure. It is not ordered and it is based
  /// a std::vector. It uses linear search, and can be faster than std::set
  // and boost::unordered_set in some cases.

  template<typename T>
  class Set
  {
  public:

    typedef typename std::vector<T>::iterator iterator;
    typedef typename std::vector<T>::const_iterator const_iterator;

    /// Create empty set
    Set() : _vec(true) {}

    /// Wrap std::vectpr as a set. Contents will be erased.
    Set(std::vector<T>& x) : _x(x), _vec(true)
    { _x.clear(); _xx.clear();}

    /// Copy constructor
    Set(const dolfin::Set<T>& x) : _x(x._x), _xx(x._xx), _vec(x._vec) {}

    /// Destructor
    ~Set() {}

    /// operator =
    dolfin::Set<T>& operator = (const dolfin::Set<T>& x) {
      this->_x = x._x;
      this->_xx = x._xx;
      this->_vec = x._vec;
      return *this;
    }

    /// Insert entry
    bool insert(const T& x)
    {
      if (_vec)
      {
        if( find(x) == this->end() )
        {
          _x.push_back(x);
          if (_x.size() == _max_size) {
            _vec = false;
            std::insert_iterator<std::set<T> > i(_xx, _xx.begin());
            std::copy(_x.begin(), _x.end(), i);
          }
          return true;
        }
        else
          return false;
      }
      else
      {
        if( _xx.find(x) == _xx.end() )
	{
          _xx.insert(x);
          _x.push_back(x);
          return true;
	}
        else
          return false;
      }
    }

    /// Insert entries
    template <typename InputIt>
    void insert(const InputIt first, const InputIt last)
    {
      for (InputIt position = first; position != last; ++position)
      {
	insert(*position);
      }
    }

    const_iterator begin() const
    { return _x.begin(); }

    const_iterator end() const
    { return _x.end(); }

    /// Set size
    std::size_t size() const
    { return _x.size(); }

    /// Erase an entry
    void erase(const T& x)
    {
      iterator p = find(x);
      if (p != _x.end()) {
        _x.erase(p);
	if (!_vec)
	  _xx.erase(x);
      }
    }

    /// Sort set
    void sort()
    { if (_vec)
        std::sort(_x.begin(), _x.end()); 
      else
        std::copy(_xx.begin(), _xx.end(), _x.begin());
    }

    /// Clear set
    void clear()
    { _x.clear();
      if (!_vec) 
      {
        _xx.clear();
        _vec = true;
      } 
    }

    /// Index the nth entry in the set
    T operator[](std::size_t n) const
    { return _x[n]; }

    /// Return the vector that stores the data in the Set
    const std::vector<T>& set() const
    { return _x; }

    /// Return the vector that stores the data in the Set
    std::vector<T>& set()
    { return _x; }

  private:
    
    /// Find entry in set and return an iterator to the entry
    iterator find(const T& x)
    { return std::find(_x.begin(), _x.end(), x); }

    /// Find entry in set and return an iterator to the entry (const)
    const_iterator find(const T& x) const
    { return std::find(_x.begin(), _x.end(), x); }
    
    std::vector<T> _x;
    std::set<T> _xx;
    bool _vec;
    static const int _max_size = 100;

  };

}

#endif
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to