Re: [deal.II] Re: PETScWrappers::MPI::Vector Vs parallel::distributed::Vector

Martin Kronbichler Sun, 28 Aug 2016 11:00:07 -0700

Dear David,

Your example of accessing all elements in a vector individually byoperator [] (or operator ()) is not very representative of the typicalsituation in most codes. All vector interfaces implement vectoraddition, scaling, and inner products such that you will not have towrite the for loop on your own. Even though PETSc appears slow in yourtest, it will perform adequately in the real examples. For large vectorsizes that do not fit into processor caches, I expect all vectorinterfaces to be relatively close because the operations should bememory bandwidth limited. All iterative solvers use the fast access pathas well and are not adequately benchmarked by your example.

If you are interested in some performance numbers for the deal.IIvectors, please go here:

https://github.com/dealii/dealii/issues/2496

The only place where operator() is extensively used in a finite elementcontext is in the assembly of (right-hand side) vectors. But even therewe have a few optimizations going on behind the scenes to make it lessprominent. Furthermore, you mostly will also assemble matrices which aremuch more expensive, so the difference in the vector assembly will notshow up. Except for the case when you work with matrix-free algorithmsand assembly loops represent your matrix-vector product and are thusperformance-critical. In that case, I would definitely recommend usingthe deal.II vectors.

Even though I mostly use the parallel deal.II vectors for my purposes,the advice given by Bruno is mostly correct - you typically choose thevector type that matches the matrix type. For example,parallel::distributed::Vector does not work with PETSc matrices at all.


Finally, let me explain the reasons for the vast difference in run times:

- The dealii::Vector<Number> class is fastest because it performs directarray access.- dealii::parallel::distributed::Vector<Number> is slower because it hasto transform the global index that goes into operator() to a MPI-localindex. On one core, this should not do anything, but the implementationcannot exploit this. Thus, each vector access does still two conditionalbranches (check whether we are in the locally owned range) and then oneindex subtraction with the lower bound 0. Most CPUs can handle thebranches really well, but the subtraction is still there and preventsthe compiler from beneficial unrolling etc.- The PETSc implementation needs to do further operations due to thewrapper between the deal.II C++ interface and the underlying PETSc datastructures. I do not know the details of the implementation - I think itdoes change the state of the PETSc objects which does some advancedstuff - but it doesn't surprise me that this is expensive. As I said, wehave not seen a reason to optimize for this case.


Best,
Martin


On 28.08.2016 19:15, David F wrote:

Hi, thanks for your answer. I have measure the time it takes forPETScWrappers::MPI::Vector, parallel::distributed::Vector< Number >and Vector< Number > to complete a very simple task consisting ofaccessing the elements of these vectors. Something like this(repeating this whole process 15 times for averaging results, andusing very big vector sizes):
|
doublea;
for(unsignedinti=0;i<v.size();++i)
    a =v[i];
|



I'm running it with a single process, and the results are:

+------------------------------------------------------------+------------+---------------------------------+
| Total wallclock time elapsed since start | 34.4s ||
| |               |                                |
| Section | no. calls| wall time | % of total |
+-----------------------------------------------------------+--------------+--------------+-----------------+
| Dealii parallel |        15 |    0.0421s |        0.12% |
| Dealii serial |        15  |     0.018s |      0.052% |
| PETSc wrapper | 15 |34.3s | 1e+02% |
+------------------------------------------------------------+-----------+----------------+-----------------+
Which shows that the PETSc wrapper is ~1000 times slower accessing itselements than the others (even local elements as I'm running a singleprocess, so it's not a communication issue). If for example I run itin parallel using 2 processes, the parallel vectors do their job inabout half the time, but the factor 1000 is simply to big to overcome.The problem I find is that the use of PETSc wrappers is mandatory forusing parallel solvers. Is it normal this huge difference inperformance? Is there any work-around in the use of PETSc wrapperswhen dealing with solvers and other parallel classes?
David.


On Friday, 26 August 2016 14:05:55 UTC+2, Bruno Turcksin wrote:

    Hi,

    I guess it's more a question of preference. What I do is using the
    same vector type as the matrix type: PETSc matrix -> PETSc vector,
    Trilinos matrix -> Trilinos vector, matrix-free -> deal.II vector.
    deal.II vector can use multithreading unlike PETSc vector but if
    you are using MPI, I don't think that you will see a big difference.

    Best,

    Bruno

    On Thursday, August 25, 2016 at 5:30:31 PM UTC-4, David F wrote:

        Hello, I would like to know if among
        PETScWrappers::MPI::Vector and parallel::distributed::Vector<
        Number >, one of them is preferred over the other. They both
        seem to have a similar functionality and a similar interface.
        Although parallel::distributed::Vector< Number > has a bigger
        interface, PETScWrappers::MPI::Vector is extensively used in
        the examples. In which situations should we use each of them?
        Is there any known difference in performance? Thanks.

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, seehttps://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the GoogleGroups "deal.II User Group" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en

---You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [deal.II] Re: PETScWrappers::MPI::Vector Vs parallel::distributed::Vector

Reply via email to