Dear David,
Your example of accessing all elements in a vector individually by
operator [] (or operator ()) is not very representative of the typical
situation in most codes. All vector interfaces implement vector
addition, scaling, and inner products such that you will not have to
write the for loop on your own. Even though PETSc appears slow in your
test, it will perform adequately in the real examples. For large vector
sizes that do not fit into processor caches, I expect all vector
interfaces to be relatively close because the operations should be
memory bandwidth limited. All iterative solvers use the fast access path
as well and are not adequately benchmarked by your example.
If you are interested in some performance numbers for the deal.II
vectors, please go here:
https://github.com/dealii/dealii/issues/2496
The only place where operator() is extensively used in a finite element
context is in the assembly of (right-hand side) vectors. But even there
we have a few optimizations going on behind the scenes to make it less
prominent. Furthermore, you mostly will also assemble matrices which are
much more expensive, so the difference in the vector assembly will not
show up. Except for the case when you work with matrix-free algorithms
and assembly loops represent your matrix-vector product and are thus
performance-critical. In that case, I would definitely recommend using
the deal.II vectors.
Even though I mostly use the parallel deal.II vectors for my purposes,
the advice given by Bruno is mostly correct - you typically choose the
vector type that matches the matrix type. For example,
parallel::distributed::Vector does not work with PETSc matrices at all.
Finally, let me explain the reasons for the vast difference in run times:
- The dealii::Vector<Number> class is fastest because it performs direct
array access.
- dealii::parallel::distributed::Vector<Number> is slower because it has
to transform the global index that goes into operator() to a MPI-local
index. On one core, this should not do anything, but the implementation
cannot exploit this. Thus, each vector access does still two conditional
branches (check whether we are in the locally owned range) and then one
index subtraction with the lower bound 0. Most CPUs can handle the
branches really well, but the subtraction is still there and prevents
the compiler from beneficial unrolling etc.
- The PETSc implementation needs to do further operations due to the
wrapper between the deal.II C++ interface and the underlying PETSc data
structures. I do not know the details of the implementation - I think it
does change the state of the PETSc objects which does some advanced
stuff - but it doesn't surprise me that this is expensive. As I said, we
have not seen a reason to optimize for this case.
Best,
Martin
On 28.08.2016 19:15, David F wrote:
Hi, thanks for your answer. I have measure the time it takes for
PETScWrappers::MPI::Vector, parallel::distributed::Vector< Number >
and Vector< Number > to complete a very simple task consisting of
accessing the elements of these vectors. Something like this
(repeating this whole process 15 times for averaging results, and
using very big vector sizes):
|
doublea;
for(unsignedinti=0;i<v.size();++i)
a =v[i];
|
I'm running it with a single process, and the results are:
+------------------------------------------------------------+------------+---------------------------------+
| Total wallclock time elapsed since start | 34.4s |
|
| | | |
| Section | no. calls
| wall time | % of total |
+-----------------------------------------------------------+--------------+--------------+-----------------+
| Dealii parallel | 15 | 0.0421s | 0.12% |
| Dealii serial | 15 | 0.018s | 0.052% |
| PETSc wrapper | 15 |
34.3s | 1e+02% |
+------------------------------------------------------------+-----------+----------------+-----------------+
Which shows that the PETSc wrapper is ~1000 times slower accessing its
elements than the others (even local elements as I'm running a single
process, so it's not a communication issue). If for example I run it
in parallel using 2 processes, the parallel vectors do their job in
about half the time, but the factor 1000 is simply to big to overcome.
The problem I find is that the use of PETSc wrappers is mandatory for
using parallel solvers. Is it normal this huge difference in
performance? Is there any work-around in the use of PETSc wrappers
when dealing with solvers and other parallel classes?
David.
On Friday, 26 August 2016 14:05:55 UTC+2, Bruno Turcksin wrote:
Hi,
I guess it's more a question of preference. What I do is using the
same vector type as the matrix type: PETSc matrix -> PETSc vector,
Trilinos matrix -> Trilinos vector, matrix-free -> deal.II vector.
deal.II vector can use multithreading unlike PETSc vector but if
you are using MPI, I don't think that you will see a big difference.
Best,
Bruno
On Thursday, August 25, 2016 at 5:30:31 PM UTC-4, David F wrote:
Hello, I would like to know if among
PETScWrappers::MPI::Vector and parallel::distributed::Vector<
Number >, one of them is preferred over the other. They both
seem to have a similar functionality and a similar interface.
Although parallel::distributed::Vector< Number > has a bigger
interface, PETScWrappers::MPI::Vector is extensively used in
the examples. In which situations should we use each of them?
Is there any known difference in performance? Thanks.
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google
Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.