Re: [hpx-users] equivalent of firstprivate

Riccardo Rossi Mon, 12 Sep 2016 08:21:17 -0700

Ok,
       i think that with your proposal it must work (thanks)

regarding allocation, the thing is that having the allocator to know of the
run policy (or the policy to know of the allocator) could allow you to
do smart things concerning what to be allocated.


The thing is that when you read data in a finite element program even if
you allocate first touch you have no way to ensure that at the moment
of using the data they will be used in the same order.

having the allocator & the policy to persist through the whole analysis
gives a good way to solve this problem

anyway...thank you very much for your time!

regards
Riccardo


On Mon, Sep 12, 2016 at 2:17 PM, Hartmut Kaiser <[email protected]>
wrote:

>
> > To my understanding an openmp for loop is expanded to smthg like
> > Vector<data_type> private_data(nthreads)
> > Here Copy data to the private data array . Once per thread
> > For(int block_counter = 0; block_counter<blocks)
> > {
> >   For(begin, end, ...)
> >    {
> >               Here capture private_data[my_thread_id]
> >                 ...Do work using the captured data
> >     }
> >
> > }
> > This way the copying is done once per thread, not once per call nor once
> > per block.
> > Of course I could emulate this, if I had access to a function like
> > Omp_get_thread_num() giving me the I'd of the current worker (I should
> > also know the number of total workers to define the private_data array).
> > Is this data available?
> > Please do note that I am just a user, so my understanding of the specs
> may
> > be faulty. My apologies if that s the case.
>
> I think your understanding of OpenMP firstprivate is correct. Also you're
> right, the solution I gave will create one copy of the lambda per
> iteration-partition.
>
> In order for having exactly one copy per kernel-thread you'd need to
> create a helper class which allocates the per-thread data. E.g. something
> like:
>
>     #include <hpx/hpx.hpp>
>     #include <vector>
>
>     template <typename T>
>     struct firstprivate_emulation
>     {
>         explicit firstprivate_emulation(T const& init)
>           : data_(hpx::get_os_thread_count(), init)
>         {
>         }
>
>         T& access()
>         {
>             std::size_t idx = hpx::get_worker_thread_num();
>             HPX_ASSERT(idx < hpx::get_os_thread_count());
>             return data_[idx];
>         }
>
>         T const& access() const
>         {
>             std::size_t idx = hpx::get_worker_thread_num();
>             HPX_ASSERT(idx < hpx::get_os_thread_count());
>             return data_[idx];
>         }
>
>     private:
>         std::vector<T> data_;
>     };
>
>     Matrix expensive_to_construct_scratchspace;
>     firstprivate_emulation<Matrix> data(expensive_to_construct_
> scratchspace);
>     for_each(par, 0, N,
>         [&](int i)
>         {
>             // access 'data' to access thread local copy of the outer
> Matrix
>             Matrix& m = data.access();
>             m[i][j] = ...
>         });
>
> > Btw, I much like your idea of allocators for Numa locality. That's a vast
> > improvement over first touch, where you never really know who's the
> > owner!!
>
> Heh, even if it uses first touch internally itself? :-)
>
> HTH
> Regards Hartmut
> ---------------
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
>
>
> > Regards
> > Riccardo
> >
> > On 11 Sep 2016 7:23 p.m., "Hartmut Kaiser" <[email protected]>
> > wrote:
> >
> > > first of all thank you very much for your quick and detailed answer.
> > > Nevertheless i think i did not explain my concern.
> > > using your code snippet, imagine i have
> > >
> > >
> > >     int nelements = 42;
> > >     Matrix expensive_to_construct_scratchspace
> > >
> > >     for_each(par, 0, N,
> > >         [nelements, expensive_to_construct_scratchspace](int i)
> > >         {
> > >             // the captured 'nelements' is initialized from the outer
> > >             // variable and each copy of the lambda has its own private
> > >             // copy
> > > HERE as i understand the lambda vould capture by value my
> > > "expensive_to_construct_scratchspace", which as i understand implies
> > that
> > > i would have one allocation per every "i". --> are u telling that this
> > is
> > > not the case? If so that would be a problem since constructing it would
> > be
> > > very expensive.
> >
> > No, that would be the case, your analysis is correct.
> >
> > > On the contrary, if the lambda does not copy by value ...  what if i do
> > > need that behaviour?
> > >
> > > note that i could definitely construct a blocked range of iterators and
> > > define a lambda acting on a given range of iterators, however that
> would
> > > be very very verbose...
> >
> > Looks like I misunderstood what firstprivate actually does...
> >
> > OTOH, in the openmp spec I read:
> >
> >     firstprivate Specifies that each thread should have its own instance
> > of
> >     a variable, and that the variable should be initialized with the
> value
> >     of the variable, because it exists before the parallel construct.
> >
> > So each thread gets its own copy, which implies copying/allocation. What
> > do I miss?
> >
> > If however you want to share the variable in between threads, just
> capture
> > it by reference:
> >
> >     Matrix expensive_to_construct_scratchspace
> >     for_each(par, 0, N,
> >         [&expensive_to_construct_scratchspace](int i)
> >         {
> >         });
> >
> > In this case you'd be responsible for making any operations on the shared
> > variable thread safe, however.
> >
> > Is that what you need?
> >
> > Regards Hartmut
> > ---------------
> > http://boost-spirit.com
> > http://stellar.cct.lsu.edu
> >
> >
> > >
> > >
> > > anyway,
> > > thanks again for your attention
> > > Riccardo
> > >
> > >
> > > On Sun, Sep 11, 2016 at 4:48 PM, Hartmut Kaiser
> > <[email protected]>
> > > wrote:
> > > Riccardo,
> > >
> > > >         i am writing since i am an OpenMP user, but i am actually
> > quite
> > > > curious in understanding the future directions of c++.
> > > >
> > > > my parallel usage is actually relatively trivial, and is covered by
> > > OpenMP
> > > > 2.5 (openmp 3.1 with supports for iterators would be better but is
> not
> > > > available in msvc)
> > > > 99% of my user needs are about parallel loops, and with c++11 lambdas
> > i
> > > > could do a lot.
> > >
> > > Right. It is a fairly simple transformation in order to turn an OpenMP
> > > parallel loop into the equivalent parallel algorithm. We specificly
> > added
> > > the parallel::for_loop() (not I the Parallelism TS/C++17) to support
> > that
> > > migration:
> > >
> > >     #pragma omp parallel for
> > >     for(int i = 0; i != N; ++i)
> > >     {
> > >         // some iteration
> > >     }
> > >
> > > Would be equivalent to
> > >
> > >     hpx::parallel::for_loop(
> > >         hpx::parallel::par,
> > >         0, N, [](int i)
> > >         {
> > >             // some iteration
> > >         });
> > >
> > > (for more information about for_loop() see here: http://www.open-
> > > std.org/jtc1/sc22/wg21/docs/papers/2015/p0075r0.pdf)
> > >
> > > > However i am really not clear on how i should equivalently handle
> > > > "private" and "firstprivate of OpenMP, which allow to create objects
> > > that
> > > > persist in the threadprivate memory during the whole lenght of a for
> > > loop.
> > > > I now use OpenMP 2.5 and i have a code that looks like the following
> > > >
> > > >
> > >
> > https://kratos.cimne.upc.es/projects/kratos/repository/
> entry/kratos/kratos
> > > >
> > >
> > /solving_strategies/builder_and_solvers/residualbased_
> block_builder_and_so
> > > > lver.h
> > > > which does an openmp parallel Finite Element assembly.
> > > > The code i am thinking of is somethign like:
> > >
> > > [snipped code]
> > >
> > > > the big question is ... how shall i handle the threadprivate
> > > scratchspace
> > > > in HPX?? Lambdas do not allow to do this ...
> > > > that is, what is the equivalente of private & of firstprivate??
> > > > thanks you in advance for any clarification or pointer to examples
> > >
> > > For 'firstprivate' you can simply use lambda captures:
> > >
> > >     int nelements = 42;
> > >
> > >     for_each(par, 0, N,
> > >         [nelements](int i)
> > >         {
> > >             // the captured 'nelements' is initialized from the outer
> > >             // variable and each copy of the lambda has its own private
> > >             // copy
> > >             //
> > >             // use private 'nelements' here:
> > >             cout << nelements << endl;
> > >         });
> > >
> > > Note, that 'nelements' will be const by default. If you want to modify
> > its
> > > value, the lambda has to be made mutable:
> > >
> > >     int nelements = 42;
> > >
> > >     for_each(par, 0, N,
> > >         [nelements](int i) mutable // makes captures non-const
> > >         {
> > >             ++nelements;
> > >         });
> > >
> > > Please don't be fooled however that this might give you one variable
> > > instance per iteration. HPX runs several iterations 'in one go'
> > (depending
> > > on the partitioning, very much like openmp), so you will create one
> > > variable instance per created partition. As long as you don't modify
> the
> > > variable this shouldn't make a difference, however.
> > >
> > > Emulating 'private' is even simpler. All you need is a local variable
> > for
> > > each iteration after all. Thus simply creating it on the stack inside
> > the
> > > lambda is the solution:
> > >
> > >     for_loop(par, 0, N, [](int i)
> > >     {
> > >         // create 'private' variable
> > >         int my_private = 0;
> > >         // ...
> > >     });
> > >
> > > This also gives you a hint on how you can have one instance of your
> > > variable per iteration and still initialize it like it was
> firstprivate:
> > >
> > >     int nelements = 42;
> > >     for_loop(par, 0, N, [nelements](int i)
> > >     {
> > >         // create 'private' variable
> > >         int my_private = nelements;
> > >         // ...
> > >         ++my_private;   // modifies instance for this iteration only.
> > >     });
> > >
> > > Things become a bit more interesting if you need reductions. Please see
> > > the linked document above for more details, but here is a simple
> example
> > > (taken from that paper):
> > >
> > >     float dot_saxpy(int n, float a, float x[], float y[])
> > >     {
> > >         float s = 0;
> > >         for_loop(par, 0, n,
> > >             reduction(s, 0.0f, std::plus<float>()),
> > >             [&](int i, float& s_)
> > >             {
> > >                 y[i] += a*x[i];
> > >                 s_ += y[i]*y[i];
> > >             });
> > >         return s;
> > >     }
> > >
> > > Here 's' is the reduction variable, and s_ is the thread-local
> reference
> > > to it.
> > >
> > > HTH
> > > Regards Hartmut
> > > ---------------
> > > http://boost-spirit.com
> > > http://stellar.cct.lsu.edu
> > >
> > >
> > >
> > >
> > > --
> > > Riccardo Rossi
> > > PhD, Civil Engineer
> > >
> > > member of the Kratos Team: www.cimne.com/kratos
> > > Tenure Track Lecturer at Universitat Politècnica de Catalunya,
> > > BarcelonaTech (UPC)
> > > Full Research Professor at International Center for Numerical Methods
> in
> > > Engineering (CIMNE)
> > >
> > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9
> > > 08034 – Barcelona – Spain – www.cimne.com  -
> > > T.(+34) 93 401 56 96 skype: rougered4
> > >
> > >
> > >
> > > Les dades personals contingudes en aquest missatge són tractades amb la
> > > finalitat de mantenir el contacte professional entre CIMNE i voste.
> > Podra
> > > exercir els drets d'accés, rectificació, cancel·lació i oposició,
> > > dirigint-se a [email protected]. La utilització de la seva adreça de
> > > correu electronic per part de CIMNE queda subjecte a les disposicions
> de
> > > la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç
> > > Electronic.
> > >  Imprimiu aquest missatge, només si és estrictament necessari.
>
>


-- 


*Riccardo Rossi*

PhD, Civil Engineer


member of the Kratos Team: www.cimne.com/kratos

Tenure Track Lecturer at Universitat Politècnica de Catalunya,
BarcelonaTech (UPC)

Full Research Professor at International Center for Numerical Methods in
Engineering (CIMNE)


C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9

08034 – Barcelona – Spain – www.cimne.com  -

T.(+34) 93 401 56 96 skype: *rougered4*



<http://www.cimne.com/>

<https://www.facebook.com/cimne> <http://blog.cimne.com/>
<http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos>
<http://www.linkedin.com/company/cimne> <https://twitter.com/cimne>

Les dades personals contingudes en aquest missatge són tractades amb la
finalitat de mantenir el contacte professional entre CIMNE i voste. Podra
exercir els drets d'accés, rectificació, cancel·lació i oposició,
dirigint-se a [email protected]. La utilització de la seva adreça de
correu electronic per part de CIMNE queda subjecte a les disposicions de la
Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç
Electronic.

 Imprimiu aquest missatge, només si és estrictament necessari.
<http://www.cimne.com/>

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] equivalent of firstprivate

Reply via email to