Re: [deal.II] Re: PETScWrappers::MPI::Vector Vs parallel::distributed::Vector

Praveen C Sun, 28 Aug 2016 21:18:07 -0700

Hello

This is more of a question.


There is this function PETScWrappers::VectorBase::extract_subvector_to

http://www.dealii.org/developer/doxygen/deal.II/classPETScWrappers_1_1VectorBase.html#ae0f3a2b5fc8e2201237fad86ae69142e

which should give faster access to the vector elements ???

>From what I remember, locally owned elements in petsc are contiguously
numbered, so this method should work for those elements.

Best
praveen

On Sun, Aug 28, 2016 at 11:29 PM, Martin Kronbichler <
[email protected]> wrote:

> Dear David,
>
> Your example of accessing all elements in a vector individually by
> operator [] (or operator ()) is not very representative of the typical
> situation in most codes. All vector interfaces implement vector addition,
> scaling, and inner products such that you will not have to write the for
> loop on your own. Even though PETSc appears slow in your test, it will
> perform adequately in the real examples. For large vector sizes that do not
> fit into processor caches, I expect all vector interfaces to be relatively
> close because the operations should be memory bandwidth limited. All
> iterative solvers use the fast access path as well and are not adequately
> benchmarked by your example.
>
> If you are interested in some performance numbers for the deal.II vectors,
> please go here:
> https://github.com/dealii/dealii/issues/2496
>
> The only place where operator() is extensively used in a finite element
> context is in the assembly of (right-hand side) vectors. But even there we
> have a few optimizations going on behind the scenes to make it less
> prominent. Furthermore, you mostly will also assemble matrices which are
> much more expensive, so the difference in the vector assembly will not show
> up. Except for the case when you work with matrix-free algorithms and
> assembly loops represent your matrix-vector product and are thus
> performance-critical. In that case, I would definitely recommend using the
> deal.II vectors.
>
> Even though I mostly use the parallel deal.II vectors for my purposes, the
> advice given by Bruno is mostly correct - you typically choose the vector
> type that matches the matrix type. For example,
> parallel::distributed::Vector does not work with PETSc matrices at all.
>
> Finally, let me explain the reasons for the vast difference in run times:
> - The dealii::Vector<Number> class is fastest because it performs direct
> array access.
> - dealii::parallel::distributed::Vector<Number> is slower because it has
> to transform the global index that goes into operator() to a MPI-local
> index. On one core, this should not do anything, but the implementation
> cannot exploit this. Thus, each vector access does still two conditional
> branches (check whether we are in the locally owned range) and then one
> index subtraction with the lower bound 0. Most CPUs can handle the branches
> really well, but the subtraction is still there and prevents the compiler
> from beneficial unrolling etc.
> - The PETSc implementation needs to do further operations due to the
> wrapper between the deal.II C++ interface and the underlying PETSc data
> structures. I do not know the details of the implementation - I think it
> does change the state of the PETSc objects which does some advanced stuff -
> but it doesn't surprise me that this is expensive. As I said, we have not
> seen a reason to optimize for this case.
>
> Best,
> Martin
>
> On 28.08.2016 19:15, David F wrote:
>
> Hi, thanks for your answer. I have measure the time it takes for
> PETScWrappers::MPI::Vector, parallel::distributed::Vector< Number > and
> Vector< Number > to complete a very simple task consisting of accessing the
> elements of these vectors. Something like this (repeating this whole
> process 15 times for averaging results, and using very big vector sizes):
>
> double a;
> for(unsigned int i=0; i<v.size(); ++i)
>     a = v[i];
>
>
>
> I'm running it with a single process, and the results are:
>
> +-----------------------------------------------------------
> -+------------+---------------------------------+
> | Total wallclock time elapsed since start    |      34.4s |
>                  |
> |                                                              |
>       |                                |
> | Section                                                  | no. calls |
> wall time | % of total |
> +-----------------------------------------------------------
> +--------------+--------------+-----------------+
> | Dealii parallel                                          |        15
> |    0.0421s |        0.12% |
> | Dealii serial                                            |        15
> |     0.018s |      0.052% |
> | PETSc wrapper                                      |        15  |
> 34.3s  |     1e+02% |
> +-----------------------------------------------------------
> -+-----------+----------------+-----------------+
>
> Which shows that the PETSc wrapper is ~1000 times slower accessing its
> elements than the others (even local elements as I'm running a single
> process, so it's not a communication issue). If for example I run it in
> parallel using 2 processes, the parallel vectors do their job in about half
> the time, but the factor 1000 is simply to big to overcome. The problem I
> find is that the use of PETSc wrappers is mandatory for using parallel
> solvers. Is it normal this huge difference in performance? Is there any
> work-around in the use of PETSc wrappers when dealing with solvers and
> other parallel classes?
>
> David.
>
>
> On Friday, 26 August 2016 14:05:55 UTC+2, Bruno Turcksin wrote:
>>
>> Hi,
>>
>> I guess it's more a question of preference. What I do is using the same
>> vector type as the matrix type: PETSc matrix -> PETSc vector, Trilinos
>> matrix -> Trilinos vector, matrix-free -> deal.II vector. deal.II vector
>> can use multithreading unlike PETSc vector but if you are using MPI, I
>> don't think that you will see a big difference.
>>
>> Best,
>>
>> Bruno
>>
>> On Thursday, August 25, 2016 at 5:30:31 PM UTC-4, David F wrote:
>>>
>>> Hello, I would like to know if among PETScWrappers::MPI::Vector and
>>> parallel::distributed::Vector< Number >, one of them is preferred over the
>>> other. They both seem to have a similar functionality and a similar
>>> interface. Although parallel::distributed::Vector< Number > has a bigger
>>> interface, PETScWrappers::MPI::Vector is extensively used in the examples.
>>> In which situations should we use each of them? Is there any known
>>> difference in performance? Thanks.
>>>
>> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/
> forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/
> forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [deal.II] Re: PETScWrappers::MPI::Vector Vs parallel::distributed::Vector

Reply via email to