Dear David,

Your example of accessing all elements in a vector individually by operator [] (or operator ()) is not very representative of the typical situation in most codes. All vector interfaces implement vector addition, scaling, and inner products such that you will not have to write the for loop on your own. Even though PETSc appears slow in your test, it will perform adequately in the real examples. For large vector sizes that do not fit into processor caches, I expect all vector interfaces to be relatively close because the operations should be memory bandwidth limited. All iterative solvers use the fast access path as well and are not adequately benchmarked by your example.

If you are interested in some performance numbers for the deal.II vectors, please go here:
https://github.com/dealii/dealii/issues/2496

The only place where operator() is extensively used in a finite element context is in the assembly of (right-hand side) vectors. But even there we have a few optimizations going on behind the scenes to make it less prominent. Furthermore, you mostly will also assemble matrices which are much more expensive, so the difference in the vector assembly will not show up. Except for the case when you work with matrix-free algorithms and assembly loops represent your matrix-vector product and are thus performance-critical. In that case, I would definitely recommend using the deal.II vectors.

Even though I mostly use the parallel deal.II vectors for my purposes, the advice given by Bruno is mostly correct - you typically choose the vector type that matches the matrix type. For example, parallel::distributed::Vector does not work with PETSc matrices at all.

Finally, let me explain the reasons for the vast difference in run times:
- The dealii::Vector<Number> class is fastest because it performs direct array access. - dealii::parallel::distributed::Vector<Number> is slower because it has to transform the global index that goes into operator() to a MPI-local index. On one core, this should not do anything, but the implementation cannot exploit this. Thus, each vector access does still two conditional branches (check whether we are in the locally owned range) and then one index subtraction with the lower bound 0. Most CPUs can handle the branches really well, but the subtraction is still there and prevents the compiler from beneficial unrolling etc. - The PETSc implementation needs to do further operations due to the wrapper between the deal.II C++ interface and the underlying PETSc data structures. I do not know the details of the implementation - I think it does change the state of the PETSc objects which does some advanced stuff - but it doesn't surprise me that this is expensive. As I said, we have not seen a reason to optimize for this case.

Best,
Martin


On 28.08.2016 19:15, David F wrote:
Hi, thanks for your answer. I have measure the time it takes for PETScWrappers::MPI::Vector, parallel::distributed::Vector< Number > and Vector< Number > to complete a very simple task consisting of accessing the elements of these vectors. Something like this (repeating this whole process 15 times for averaging results, and using very big vector sizes):

|
doublea;
for(unsignedinti=0;i<v.size();++i)
    a =v[i];
|



I'm running it with a single process, and the results are:

+------------------------------------------------------------+------------+---------------------------------+
| Total wallclock time elapsed since start | 34.4s | |
| |               |                                |
| Section | no. calls | wall time | % of total |
+-----------------------------------------------------------+--------------+--------------+-----------------+
| Dealii parallel |        15 |    0.0421s |        0.12% |
| Dealii serial |        15  |     0.018s |      0.052% |
| PETSc wrapper | 15 | 34.3s | 1e+02% |
+------------------------------------------------------------+-----------+----------------+-----------------+

Which shows that the PETSc wrapper is ~1000 times slower accessing its elements than the others (even local elements as I'm running a single process, so it's not a communication issue). If for example I run it in parallel using 2 processes, the parallel vectors do their job in about half the time, but the factor 1000 is simply to big to overcome. The problem I find is that the use of PETSc wrappers is mandatory for using parallel solvers. Is it normal this huge difference in performance? Is there any work-around in the use of PETSc wrappers when dealing with solvers and other parallel classes?

David.


On Friday, 26 August 2016 14:05:55 UTC+2, Bruno Turcksin wrote:

    Hi,

    I guess it's more a question of preference. What I do is using the
    same vector type as the matrix type: PETSc matrix -> PETSc vector,
    Trilinos matrix -> Trilinos vector, matrix-free -> deal.II vector.
    deal.II vector can use multithreading unlike PETSc vector but if
    you are using MPI, I don't think that you will see a big difference.

    Best,

    Bruno

    On Thursday, August 25, 2016 at 5:30:31 PM UTC-4, David F wrote:

        Hello, I would like to know if among
        PETScWrappers::MPI::Vector and parallel::distributed::Vector<
        Number >, one of them is preferred over the other. They both
        seem to have a similar functionality and a similar interface.
        Although parallel::distributed::Vector< Number > has a bigger
        interface, PETScWrappers::MPI::Vector is extensively used in
        the examples. In which situations should we use each of them?
        Is there any known difference in performance? Thanks.

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to