Dear Hartmut, i have been thinking about your proposal, and I believe it would work for me.
i have a few comments: 1 - i would also leave the default constructor for the value (that would allow emulating "private" and not just "firstprivate") 2 - shouldn't the execution policy? (sequential_execution_policy/parallel_execution_policy/... ) be passed to this threadprivate emulator? I guess in some cases it may be convenient, for example because one could specialize the case of "sequential" to have no overhead. 3 - a question more than a comment: this is not doing the work of the "thread_local" keyword? I guess current limitations of the compilers do not allow that ... so take this as a forward_looking question thanks again for your time and patience Riccardo On Mon, Sep 12, 2016 at 5:18 PM, Riccardo Rossi <rro...@cimne.upc.edu> wrote: > Ok, > i think that with your proposal it must work (thanks) > > regarding allocation, the thing is that having the allocator to know of > the run policy (or the policy to know of the allocator) could allow you to > do smart things concerning what to be allocated. > > The thing is that when you read data in a finite element program even if > you allocate first touch you have no way to ensure that at the moment > of using the data they will be used in the same order. > > having the allocator & the policy to persist through the whole analysis > gives a good way to solve this problem > > anyway...thank you very much for your time! > > regards > Riccardo > > > On Mon, Sep 12, 2016 at 2:17 PM, Hartmut Kaiser <hartmut.kai...@gmail.com> > wrote: > >> >> > To my understanding an openmp for loop is expanded to smthg like >> > Vector<data_type> private_data(nthreads) >> > Here Copy data to the private data array . Once per thread >> > For(int block_counter = 0; block_counter<blocks) >> > { >> > For(begin, end, ...) >> > { >> > Here capture private_data[my_thread_id] >> > ...Do work using the captured data >> > } >> > >> > } >> > This way the copying is done once per thread, not once per call nor once >> > per block. >> > Of course I could emulate this, if I had access to a function like >> > Omp_get_thread_num() giving me the I'd of the current worker (I should >> > also know the number of total workers to define the private_data array). >> > Is this data available? >> > Please do note that I am just a user, so my understanding of the specs >> may >> > be faulty. My apologies if that s the case. >> >> I think your understanding of OpenMP firstprivate is correct. Also you're >> right, the solution I gave will create one copy of the lambda per >> iteration-partition. >> >> In order for having exactly one copy per kernel-thread you'd need to >> create a helper class which allocates the per-thread data. E.g. something >> like: >> >> #include <hpx/hpx.hpp> >> #include <vector> >> >> template <typename T> >> struct firstprivate_emulation >> { >> explicit firstprivate_emulation(T const& init) >> : data_(hpx::get_os_thread_count(), init) >> { >> } >> >> T& access() >> { >> std::size_t idx = hpx::get_worker_thread_num(); >> HPX_ASSERT(idx < hpx::get_os_thread_count()); >> return data_[idx]; >> } >> >> T const& access() const >> { >> std::size_t idx = hpx::get_worker_thread_num(); >> HPX_ASSERT(idx < hpx::get_os_thread_count()); >> return data_[idx]; >> } >> >> private: >> std::vector<T> data_; >> }; >> >> Matrix expensive_to_construct_scratchspace; >> firstprivate_emulation<Matrix> data(expensive_to_construct_sc >> ratchspace); >> for_each(par, 0, N, >> [&](int i) >> { >> // access 'data' to access thread local copy of the outer >> Matrix >> Matrix& m = data.access(); >> m[i][j] = ... >> }); >> >> > Btw, I much like your idea of allocators for Numa locality. That's a >> vast >> > improvement over first touch, where you never really know who's the >> > owner!! >> >> Heh, even if it uses first touch internally itself? :-) >> >> HTH >> Regards Hartmut >> --------------- >> http://boost-spirit.com >> http://stellar.cct.lsu.edu >> >> >> > Regards >> > Riccardo >> > >> > On 11 Sep 2016 7:23 p.m., "Hartmut Kaiser" <hartmut.kai...@gmail.com> >> > wrote: >> > >> > > first of all thank you very much for your quick and detailed answer. >> > > Nevertheless i think i did not explain my concern. >> > > using your code snippet, imagine i have >> > > >> > > >> > > int nelements = 42; >> > > Matrix expensive_to_construct_scratchspace >> > > >> > > for_each(par, 0, N, >> > > [nelements, expensive_to_construct_scratchspace](int i) >> > > { >> > > // the captured 'nelements' is initialized from the outer >> > > // variable and each copy of the lambda has its own >> private >> > > // copy >> > > HERE as i understand the lambda vould capture by value my >> > > "expensive_to_construct_scratchspace", which as i understand implies >> > that >> > > i would have one allocation per every "i". --> are u telling that this >> > is >> > > not the case? If so that would be a problem since constructing it >> would >> > be >> > > very expensive. >> > >> > No, that would be the case, your analysis is correct. >> > >> > > On the contrary, if the lambda does not copy by value ... what if i >> do >> > > need that behaviour? >> > > >> > > note that i could definitely construct a blocked range of iterators >> and >> > > define a lambda acting on a given range of iterators, however that >> would >> > > be very very verbose... >> > >> > Looks like I misunderstood what firstprivate actually does... >> > >> > OTOH, in the openmp spec I read: >> > >> > firstprivate Specifies that each thread should have its own instance >> > of >> > a variable, and that the variable should be initialized with the >> value >> > of the variable, because it exists before the parallel construct. >> > >> > So each thread gets its own copy, which implies copying/allocation. What >> > do I miss? >> > >> > If however you want to share the variable in between threads, just >> capture >> > it by reference: >> > >> > Matrix expensive_to_construct_scratchspace >> > for_each(par, 0, N, >> > [&expensive_to_construct_scratchspace](int i) >> > { >> > }); >> > >> > In this case you'd be responsible for making any operations on the >> shared >> > variable thread safe, however. >> > >> > Is that what you need? >> > >> > Regards Hartmut >> > --------------- >> > http://boost-spirit.com >> > http://stellar.cct.lsu.edu >> > >> > >> > > >> > > >> > > anyway, >> > > thanks again for your attention >> > > Riccardo >> > > >> > > >> > > On Sun, Sep 11, 2016 at 4:48 PM, Hartmut Kaiser >> > <hartmut.kai...@gmail.com> >> > > wrote: >> > > Riccardo, >> > > >> > > > i am writing since i am an OpenMP user, but i am actually >> > quite >> > > > curious in understanding the future directions of c++. >> > > > >> > > > my parallel usage is actually relatively trivial, and is covered by >> > > OpenMP >> > > > 2.5 (openmp 3.1 with supports for iterators would be better but is >> not >> > > > available in msvc) >> > > > 99% of my user needs are about parallel loops, and with c++11 >> lambdas >> > i >> > > > could do a lot. >> > > >> > > Right. It is a fairly simple transformation in order to turn an OpenMP >> > > parallel loop into the equivalent parallel algorithm. We specificly >> > added >> > > the parallel::for_loop() (not I the Parallelism TS/C++17) to support >> > that >> > > migration: >> > > >> > > #pragma omp parallel for >> > > for(int i = 0; i != N; ++i) >> > > { >> > > // some iteration >> > > } >> > > >> > > Would be equivalent to >> > > >> > > hpx::parallel::for_loop( >> > > hpx::parallel::par, >> > > 0, N, [](int i) >> > > { >> > > // some iteration >> > > }); >> > > >> > > (for more information about for_loop() see here: http://www.open- >> > > std.org/jtc1/sc22/wg21/docs/papers/2015/p0075r0.pdf) >> > > >> > > > However i am really not clear on how i should equivalently handle >> > > > "private" and "firstprivate of OpenMP, which allow to create objects >> > > that >> > > > persist in the threadprivate memory during the whole lenght of a for >> > > loop. >> > > > I now use OpenMP 2.5 and i have a code that looks like the following >> > > > >> > > > >> > > >> > https://kratos.cimne.upc.es/projects/kratos/repository/entry >> /kratos/kratos >> > > > >> > > >> > /solving_strategies/builder_and_solvers/residualbased_block_ >> builder_and_so >> > > > lver.h >> > > > which does an openmp parallel Finite Element assembly. >> > > > The code i am thinking of is somethign like: >> > > >> > > [snipped code] >> > > >> > > > the big question is ... how shall i handle the threadprivate >> > > scratchspace >> > > > in HPX?? Lambdas do not allow to do this ... >> > > > that is, what is the equivalente of private & of firstprivate?? >> > > > thanks you in advance for any clarification or pointer to examples >> > > >> > > For 'firstprivate' you can simply use lambda captures: >> > > >> > > int nelements = 42; >> > > >> > > for_each(par, 0, N, >> > > [nelements](int i) >> > > { >> > > // the captured 'nelements' is initialized from the outer >> > > // variable and each copy of the lambda has its own >> private >> > > // copy >> > > // >> > > // use private 'nelements' here: >> > > cout << nelements << endl; >> > > }); >> > > >> > > Note, that 'nelements' will be const by default. If you want to modify >> > its >> > > value, the lambda has to be made mutable: >> > > >> > > int nelements = 42; >> > > >> > > for_each(par, 0, N, >> > > [nelements](int i) mutable // makes captures non-const >> > > { >> > > ++nelements; >> > > }); >> > > >> > > Please don't be fooled however that this might give you one variable >> > > instance per iteration. HPX runs several iterations 'in one go' >> > (depending >> > > on the partitioning, very much like openmp), so you will create one >> > > variable instance per created partition. As long as you don't modify >> the >> > > variable this shouldn't make a difference, however. >> > > >> > > Emulating 'private' is even simpler. All you need is a local variable >> > for >> > > each iteration after all. Thus simply creating it on the stack inside >> > the >> > > lambda is the solution: >> > > >> > > for_loop(par, 0, N, [](int i) >> > > { >> > > // create 'private' variable >> > > int my_private = 0; >> > > // ... >> > > }); >> > > >> > > This also gives you a hint on how you can have one instance of your >> > > variable per iteration and still initialize it like it was >> firstprivate: >> > > >> > > int nelements = 42; >> > > for_loop(par, 0, N, [nelements](int i) >> > > { >> > > // create 'private' variable >> > > int my_private = nelements; >> > > // ... >> > > ++my_private; // modifies instance for this iteration only. >> > > }); >> > > >> > > Things become a bit more interesting if you need reductions. Please >> see >> > > the linked document above for more details, but here is a simple >> example >> > > (taken from that paper): >> > > >> > > float dot_saxpy(int n, float a, float x[], float y[]) >> > > { >> > > float s = 0; >> > > for_loop(par, 0, n, >> > > reduction(s, 0.0f, std::plus<float>()), >> > > [&](int i, float& s_) >> > > { >> > > y[i] += a*x[i]; >> > > s_ += y[i]*y[i]; >> > > }); >> > > return s; >> > > } >> > > >> > > Here 's' is the reduction variable, and s_ is the thread-local >> reference >> > > to it. >> > > >> > > HTH >> > > Regards Hartmut >> > > --------------- >> > > http://boost-spirit.com >> > > http://stellar.cct.lsu.edu >> > > >> > > >> > > >> > > >> > > -- >> > > Riccardo Rossi >> > > PhD, Civil Engineer >> > > >> > > member of the Kratos Team: www.cimne.com/kratos >> > > Tenure Track Lecturer at Universitat Politècnica de Catalunya, >> > > BarcelonaTech (UPC) >> > > Full Research Professor at International Center for Numerical Methods >> in >> > > Engineering (CIMNE) >> > > >> > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9 >> > > 08034 – Barcelona – Spain – www.cimne.com - >> > > T.(+34) 93 401 56 96 skype: rougered4 >> > > >> > > >> > > >> > > Les dades personals contingudes en aquest missatge són tractades amb >> la >> > > finalitat de mantenir el contacte professional entre CIMNE i voste. >> > Podra >> > > exercir els drets d'accés, rectificació, cancel·lació i oposició, >> > > dirigint-se a ci...@cimne.upc.edu. La utilització de la seva adreça >> de >> > > correu electronic per part de CIMNE queda subjecte a les disposicions >> de >> > > la Llei 34/2002, de Serveis de la Societat de la Informació i el >> Comerç >> > > Electronic. >> > > Imprimiu aquest missatge, només si és estrictament necessari. >> >> > > > -- > > > *Riccardo Rossi* > > PhD, Civil Engineer > > > member of the Kratos Team: www.cimne.com/kratos > > Tenure Track Lecturer at Universitat Politècnica de Catalunya, > BarcelonaTech (UPC) > > Full Research Professor at International Center for Numerical Methods in > Engineering (CIMNE) > > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9 > > 08034 – Barcelona – Spain – www.cimne.com - > > T.(+34) 93 401 56 96 skype: *rougered4* > > > > <http://www.cimne.com/> > > <https://www.facebook.com/cimne> <http://blog.cimne.com/> > <http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos> > <http://www.linkedin.com/company/cimne> <https://twitter.com/cimne> > > Les dades personals contingudes en aquest missatge són tractades amb la > finalitat de mantenir el contacte professional entre CIMNE i voste. Podra > exercir els drets d'accés, rectificació, cancel·lació i oposició, > dirigint-se a ci...@cimne.upc.edu. La utilització de la seva adreça de > correu electronic per part de CIMNE queda subjecte a les disposicions de la > Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç > Electronic. > > Imprimiu aquest missatge, només si és estrictament necessari. > <http://www.cimne.com/> > -- *Riccardo Rossi* PhD, Civil Engineer member of the Kratos Team: www.cimne.com/kratos Tenure Track Lecturer at Universitat Politècnica de Catalunya, BarcelonaTech (UPC) Full Research Professor at International Center for Numerical Methods in Engineering (CIMNE) C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9 08034 – Barcelona – Spain – www.cimne.com - T.(+34) 93 401 56 96 skype: *rougered4* <http://www.cimne.com/> <https://www.facebook.com/cimne> <http://blog.cimne.com/> <http://vimeo.com/cimne> <http://www.youtube.com/user/CIMNEvideos> <http://www.linkedin.com/company/cimne> <https://twitter.com/cimne> Les dades personals contingudes en aquest missatge són tractades amb la finalitat de mantenir el contacte professional entre CIMNE i voste. Podra exercir els drets d'accés, rectificació, cancel·lació i oposició, dirigint-se a ci...@cimne.upc.edu. La utilització de la seva adreça de correu electronic per part de CIMNE queda subjecte a les disposicions de la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç Electronic. Imprimiu aquest missatge, només si és estrictament necessari. <http://www.cimne.com/>
_______________________________________________ hpx-users mailing list hpx-users@stellar.cct.lsu.edu https://mail.cct.lsu.edu/mailman/listinfo/hpx-users