Re: [hpx-users] equivalent of firstprivate

Riccardo Rossi Fri, 16 Sep 2016 06:17:46 -0700

Dear Hartmut,

i have been thinking about your proposal, and I believe it would work for
me.


i have a few  comments:
1 - i would also leave the default constructor for the value (that would
allow emulating "private" and not just "firstprivate")
2 - shouldn't  the execution policy?
(sequential_execution_policy/parallel_execution_policy/...
) be passed to this threadprivate emulator? I guess in some cases it may be
convenient, for example because one could specialize the case of
"sequential" to have no overhead.
3 - a question more than a comment: this is not doing the work of the
"thread_local" keyword? I guess current limitations of the compilers do not
allow that ... so take this as a forward_looking question

thanks again for your time and patience
Riccardo




On Mon, Sep 12, 2016 at 5:18 PM, Riccardo Rossi <[email protected]>
wrote:

> Ok,
>        i think that with your proposal it must work (thanks)
>
> regarding allocation, the thing is that having the allocator to know of
> the run policy (or the policy to know of the allocator) could allow you to
> do smart things concerning what to be allocated.
>
> The thing is that when you read data in a finite element program even if
> you allocate first touch you have no way to ensure that at the moment
> of using the data they will be used in the same order.
>
> having the allocator & the policy to persist through the whole analysis
> gives a good way to solve this problem
>
> anyway...thank you very much for your time!
>
> regards
> Riccardo
>
>
> On Mon, Sep 12, 2016 at 2:17 PM, Hartmut Kaiser <[email protected]>
> wrote:
>
>>
>> > To my understanding an openmp for loop is expanded to smthg like
>> > Vector<data_type> private_data(nthreads)
>> > Here Copy data to the private data array . Once per thread
>> > For(int block_counter = 0; block_counter<blocks)
>> > {
>> >   For(begin, end, ...)
>> >    {
>> >               Here capture private_data[my_thread_id]
>> >                 ...Do work using the captured data
>> >     }
>> >
>> > }
>> > This way the copying is done once per thread, not once per call nor once
>> > per block.
>> > Of course I could emulate this, if I had access to a function like
>> > Omp_get_thread_num() giving me the I'd of the current worker (I should
>> > also know the number of total workers to define the private_data array).
>> > Is this data available?
>> > Please do note that I am just a user, so my understanding of the specs
>> may
>> > be faulty. My apologies if that s the case.
>>
>> I think your understanding of OpenMP firstprivate is correct. Also you're
>> right, the solution I gave will create one copy of the lambda per
>> iteration-partition.
>>
>> In order for having exactly one copy per kernel-thread you'd need to
>> create a helper class which allocates the per-thread data. E.g. something
>> like:
>>
>>     #include <hpx/hpx.hpp>
>>     #include <vector>
>>
>>     template <typename T>
>>     struct firstprivate_emulation
>>     {
>>         explicit firstprivate_emulation(T const& init)
>>           : data_(hpx::get_os_thread_count(), init)
>>         {
>>         }
>>
>>         T& access()
>>         {
>>             std::size_t idx = hpx::get_worker_thread_num();
>>             HPX_ASSERT(idx < hpx::get_os_thread_count());
>>             return data_[idx];
>>         }
>>
>>         T const& access() const
>>         {
>>             std::size_t idx = hpx::get_worker_thread_num();
>>             HPX_ASSERT(idx < hpx::get_os_thread_count());
>>             return data_[idx];
>>         }
>>
>>     private:
>>         std::vector<T> data_;
>>     };
>>
>>     Matrix expensive_to_construct_scratchspace;
>>     firstprivate_emulation<Matrix> data(expensive_to_construct_sc
>> ratchspace);
>>     for_each(par, 0, N,
>>         [&](int i)
>>         {
>>             // access 'data' to access thread local copy of the outer
>> Matrix
>>             Matrix& m = data.access();
>>             m[i][j] = ...
>>         });
>>
>> > Btw, I much like your idea of allocators for Numa locality. That's a
>> vast
>> > improvement over first touch, where you never really know who's the
>> > owner!!
>>
>> Heh, even if it uses first touch internally itself? :-)
>>
>> HTH
>> Regards Hartmut
>> ---------------
>> http://boost-spirit.com
>> http://stellar.cct.lsu.edu
>>
>>
>> > Regards
>> > Riccardo
>> >
>> > On 11 Sep 2016 7:23 p.m., "Hartmut Kaiser" <[email protected]>
>> > wrote:
>> >
>> > > first of all thank you very much for your quick and detailed answer.
>> > > Nevertheless i think i did not explain my concern.
>> > > using your code snippet, imagine i have
>> > >
>> > >
>> > >     int nelements = 42;
>> > >     Matrix expensive_to_construct_scratchspace
>> > >
>> > >     for_each(par, 0, N,
>> > >         [nelements, expensive_to_construct_scratchspace](int i)
>> > >         {
>> > >             // the captured 'nelements' is initialized from the outer
>> > >             // variable and each copy of the lambda has its own
>> private
>> > >             // copy
>> > > HERE as i understand the lambda vould capture by value my
>> > > "expensive_to_construct_scratchspace", which as i understand implies
>> > that
>> > > i would have one allocation per every "i". --> are u telling that this
>> > is
>> > > not the case? If so that would be a problem since constructing it
>> would
>> > be
>> > > very expensive.
>> >
>> > No, that would be the case, your analysis is correct.
>> >
>> > > On the contrary, if the lambda does not copy by value ...  what if i
>> do
>> > > need that behaviour?
>> > >
>> > > note that i could definitely construct a blocked range of iterators
>> and
>> > > define a lambda acting on a given range of iterators, however that
>> would
>> > > be very very verbose...
>> >
>> > Looks like I misunderstood what firstprivate actually does...
>> >
>> > OTOH, in the openmp spec I read:
>> >
>> >     firstprivate Specifies that each thread should have its own instance
>> > of
>> >     a variable, and that the variable should be initialized with the
>> value
>> >     of the variable, because it exists before the parallel construct.
>> >
>> > So each thread gets its own copy, which implies copying/allocation. What
>> > do I miss?
>> >
>> > If however you want to share the variable in between threads, just
>> capture
>> > it by reference:
>> >
>> >     Matrix expensive_to_construct_scratchspace
>> >     for_each(par, 0, N,
>> >         [&expensive_to_construct_scratchspace](int i)
>> >         {
>> >         });
>> >
>> > In this case you'd be responsible for making any operations on the
>> shared
>> > variable thread safe, however.
>> >
>> > Is that what you need?
>> >
>> > Regards Hartmut
>> > ---------------
>> > http://boost-spirit.com
>> > http://stellar.cct.lsu.edu
>> >
>> >
>> > >
>> > >
>> > > anyway,
>> > > thanks again for your attention
>> > > Riccardo
>> > >
>> > >
>> > > On Sun, Sep 11, 2016 at 4:48 PM, Hartmut Kaiser
>> > <[email protected]>
>> > > wrote:
>> > > Riccardo,
>> > >
>> > > >         i am writing since i am an OpenMP user, but i am actually
>> > quite
>> > > > curious in understanding the future directions of c++.
>> > > >
>> > > > my parallel usage is actually relatively trivial, and is covered by
>> > > OpenMP
>> > > > 2.5 (openmp 3.1 with supports for iterators would be better but is
>> not
>> > > > available in msvc)
>> > > > 99% of my user needs are about parallel loops, and with c++11
>> lambdas
>> > i
>> > > > could do a lot.
>> > >
>> > > Right. It is a fairly simple transformation in order to turn an OpenMP
>> > > parallel loop into the equivalent parallel algorithm. We specificly
>> > added
>> > > the parallel::for_loop() (not I the Parallelism TS/C++17) to support
>> > that
>> > > migration:
>> > >
>> > >     #pragma omp parallel for
>> > >     for(int i = 0; i != N; ++i)
>> > >     {
>> > >         // some iteration
>> > >     }
>> > >
>> > > Would be equivalent to
>> > >
>> > >     hpx::parallel::for_loop(
>> > >         hpx::parallel::par,
>> > >         0, N, [](int i)
>> > >         {
>> > >             // some iteration
>> > >         });
>> > >
>> > > (for more information about for_loop() see here: http://www.open-
>> > > std.org/jtc1/sc22/wg21/docs/papers/2015/p0075r0.pdf)
>> > >
>> > > > However i am really not clear on how i should equivalently handle
>> > > > "private" and "firstprivate of OpenMP, which allow to create objects
>> > > that
>> > > > persist in the threadprivate memory during the whole lenght of a for
>> > > loop.
>> > > > I now use OpenMP 2.5 and i have a code that looks like the following
>> > > >
>> > > >
>> > >
>> > https://kratos.cimne.upc.es/projects/kratos/repository/entry
>> /kratos/kratos
>> > > >
>> > >
>> > /solving_strategies/builder_and_solvers/residualbased_block_
>> builder_and_so
>> > > > lver.h
>> > > > which does an openmp parallel Finite Element assembly.
>> > > > The code i am thinking of is somethign like:
>> > >
>> > > [snipped code]
>> > >
>> > > > the big question is ... how shall i handle the threadprivate
>> > > scratchspace
>> > > > in HPX?? Lambdas do not allow to do this ...
>> > > > that is, what is the equivalente of private & of firstprivate??
>> > > > thanks you in advance for any clarification or pointer to examples
>> > >
>> > > For 'firstprivate' you can simply use lambda captures:
>> > >
>> > >     int nelements = 42;
>> > >
>> > >     for_each(par, 0, N,
>> > >         [nelements](int i)
>> > >         {
>> > >             // the captured 'nelements' is initialized from the outer
>> > >             // variable and each copy of the lambda has its own
>> private
>> > >             // copy
>> > >             //
>> > >             // use private 'nelements' here:
>> > >             cout << nelements << endl;
>> > >         });
>> > >
>> > > Note, that 'nelements' will be const by default. If you want to modify
>> > its
>> > > value, the lambda has to be made mutable:
>> > >
>> > >     int nelements = 42;
>> > >
>> > >     for_each(par, 0, N,
>> > >         [nelements](int i) mutable // makes captures non-const
>> > >         {
>> > >             ++nelements;
>> > >         });
>> > >
>> > > Please don't be fooled however that this might give you one variable
>> > > instance per iteration. HPX runs several iterations 'in one go'
>> > (depending
>> > > on the partitioning, very much like openmp), so you will create one
>> > > variable instance per created partition. As long as you don't modify
>> the
>> > > variable this shouldn't make a difference, however.
>> > >
>> > > Emulating 'private' is even simpler. All you need is a local variable
>> > for
>> > > each iteration after all. Thus simply creating it on the stack inside
>> > the
>> > > lambda is the solution:
>> > >
>> > >     for_loop(par, 0, N, [](int i)
>> > >     {
>> > >         // create 'private' variable
>> > >         int my_private = 0;
>> > >         // ...
>> > >     });
>> > >
>> > > This also gives you a hint on how you can have one instance of your
>> > > variable per iteration and still initialize it like it was
>> firstprivate:
>> > >
>> > >     int nelements = 42;
>> > >     for_loop(par, 0, N, [nelements](int i)
>> > >     {
>> > >         // create 'private' variable
>> > >         int my_private = nelements;
>> > >         // ...
>> > >         ++my_private;   // modifies instance for this iteration only.
>> > >     });
>> > >
>> > > Things become a bit more interesting if you need reductions. Please
>> see
>> > > the linked document above for more details, but here is a simple
>> example
>> > > (taken from that paper):
>> > >
>> > >     float dot_saxpy(int n, float a, float x[], float y[])
>> > >     {
>> > >         float s = 0;
>> > >         for_loop(par, 0, n,
>> > >             reduction(s, 0.0f, std::plus<float>()),
>> > >             [&](int i, float& s_)
>> > >             {
>> > >                 y[i] += a*x[i];
>> > >                 s_ += y[i]*y[i];
>> > >             });
>> > >         return s;
>> > >     }
>> > >
>> > > Here 's' is the reduction variable, and s_ is the thread-local
>> reference
>> > > to it.
>> > >
>> > > HTH
>> > > Regards Hartmut
>> > > ---------------
>> > > http://boost-spirit.com
>> > > http://stellar.cct.lsu.edu
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Riccardo Rossi
>> > > PhD, Civil Engineer
>> > >
>> > > member of the Kratos Team: www.cimne.com/kratos
>> > > Tenure Track Lecturer at Universitat Politècnica de Catalunya,
>> > > BarcelonaTech (UPC)
>> > > Full Research Professor at International Center for Numerical Methods
>> in
>> > > Engineering (CIMNE)
>> > >
>> > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9
>> > > 08034 – Barcelona – Spain – www.cimne.com  -
>> > > T.(+34) 93 401 56 96 skype: rougered4
>> > >
>> > >
>> > >
>> > > Les dades personals contingudes en aquest missatge són tractades amb
>> la
>> > > finalitat de mantenir el contacte professional entre CIMNE i voste.
>> > Podra
>> > > exercir els drets d'accés, rectificació, cancel·lació i oposició,
>> > > dirigint-se a [email protected]. La utilització de la seva adreça
>> de
>> > > correu electronic per part de CIMNE queda subjecte a les disposicions
>> de
>> > > la Llei 34/2002, de Serveis de la Societat de la Informació i el
>> Comerç
>> > > Electronic.
>> > >  Imprimiu aquest missatge, només si és estrictament necessari.
>>
>>
>
>
> --
>
>
> *Riccardo Rossi*
>
> PhD, Civil Engineer
>
>
> member of the Kratos Team: www.cimne.com/kratos
>
> Tenure Track Lecturer at Universitat Politècnica de Catalunya,
> BarcelonaTech (UPC)
>
> Full Research Professor at International Center for Numerical Methods in
> Engineering (CIMNE)
>
>
> C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9
>
> 08034 – Barcelona – Spain – www.cimne.com  -
>
> T.(+34) 93 401 56 96 skype: *rougered4*
>
>
>
> <http://www.cimne.com/>
>
> <https://www.facebook.com/cimne> <http://blog.cimne.com/>
> <http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos>
> <http://www.linkedin.com/company/cimne> <https://twitter.com/cimne>
>
> Les dades personals contingudes en aquest missatge són tractades amb la
> finalitat de mantenir el contacte professional entre CIMNE i voste. Podra
> exercir els drets d'accés, rectificació, cancel·lació i oposició,
> dirigint-se a [email protected]. La utilització de la seva adreça de
> correu electronic per part de CIMNE queda subjecte a les disposicions de la
> Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç
> Electronic.
>
>  Imprimiu aquest missatge, només si és estrictament necessari.
> <http://www.cimne.com/>
>



-- 


*Riccardo Rossi*

PhD, Civil Engineer


member of the Kratos Team: www.cimne.com/kratos

Tenure Track Lecturer at Universitat Politècnica de Catalunya,
BarcelonaTech (UPC)

Full Research Professor at International Center for Numerical Methods in
Engineering (CIMNE)


C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9

08034 – Barcelona – Spain – www.cimne.com  -

T.(+34) 93 401 56 96 skype: *rougered4*



<http://www.cimne.com/>

<https://www.facebook.com/cimne> <http://blog.cimne.com/>
<http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos>
<http://www.linkedin.com/company/cimne> <https://twitter.com/cimne>

Les dades personals contingudes en aquest missatge són tractades amb la
finalitat de mantenir el contacte professional entre CIMNE i voste. Podra
exercir els drets d'accés, rectificació, cancel·lació i oposició,
dirigint-se a [email protected]. La utilització de la seva adreça de
correu electronic per part de CIMNE queda subjecte a les disposicions de la
Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç
Electronic.

 Imprimiu aquest missatge, només si és estrictament necessari.
<http://www.cimne.com/>

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] equivalent of firstprivate

Reply via email to