Riccardo, > i have been thinking about your proposal, and I believe it would work for > me. > i have a few comments: > 1 - i would also leave the default constructor for the value (that would > allow emulating "private" and not just "firstprivate")
Makes sense. However just 'private' can be done easier by having a local variable inside the lambda. Well, except if you want to do some reduction afterwards, in which case the scheme I proposed is the way to go. See here for a possible implementation allowing for reductions: https://github.com/STEllAR-GROUP/hpx/blob/master/examples/quickstart/safe_object.cpp. > 2 - shouldn't the execution policy? > (sequential_execution_policy/parallel_execution_policy/... ) be passed to > this threadprivate emulator? I guess in some cases it may be convenient, > for example because one could specialize the case of "sequential" to have > no overhead. Yes, sure. I made the code up just to answer your question. > 3 - a question more than a comment: this is not doing the work of the > "thread_local" keyword? I guess current limitations of the compilers do > not allow that ... so take this as a forward_looking question > thanks again for your time and patience Using thread_local implies to create a copy of the variable for _every_ (kernel-)thread. The scheme I proposed will do so for the relevant threads only. Also, I'm simply not sure how the initialization of the thread_local variables might work in your case. If you gain any insights I'd love to hear about it, though. Thanks! Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu > Riccardo > > > > On Mon, Sep 12, 2016 at 5:18 PM, Riccardo Rossi <[email protected]> > wrote: > Ok, > i think that with your proposal it must work (thanks) > regarding allocation, the thing is that having the allocator to know of > the run policy (or the policy to know of the allocator) could allow you to > do smart things concerning what to be allocated. > The thing is that when you read data in a finite element program even if > you allocate first touch you have no way to ensure that at the moment > of using the data they will be used in the same order. > having the allocator & the policy to persist through the whole analysis > gives a good way to solve this problem > anyway...thank you very much for your time! > regards > Riccardo > > On Mon, Sep 12, 2016 at 2:17 PM, Hartmut Kaiser <[email protected]> > wrote: > > > To my understanding an openmp for loop is expanded to smthg like > > Vector<data_type> private_data(nthreads) > > Here Copy data to the private data array . Once per thread > > For(int block_counter = 0; block_counter<blocks) > > { > > For(begin, end, ...) > > { > > Here capture private_data[my_thread_id] > > ...Do work using the captured data > > } > > > > } > > This way the copying is done once per thread, not once per call nor once > > per block. > > Of course I could emulate this, if I had access to a function like > > Omp_get_thread_num() giving me the I'd of the current worker (I should > > also know the number of total workers to define the private_data array). > > Is this data available? > > Please do note that I am just a user, so my understanding of the specs > may > > be faulty. My apologies if that s the case. > > I think your understanding of OpenMP firstprivate is correct. Also you're > right, the solution I gave will create one copy of the lambda per > iteration-partition. > > In order for having exactly one copy per kernel-thread you'd need to > create a helper class which allocates the per-thread data. E.g. something > like: > > #include <hpx/hpx.hpp> > #include <vector> > > template <typename T> > struct firstprivate_emulation > { > explicit firstprivate_emulation(T const& init) > : data_(hpx::get_os_thread_count(), init) > { > } > > T& access() > { > std::size_t idx = hpx::get_worker_thread_num(); > HPX_ASSERT(idx < hpx::get_os_thread_count()); > return data_[idx]; > } > > T const& access() const > { > std::size_t idx = hpx::get_worker_thread_num(); > HPX_ASSERT(idx < hpx::get_os_thread_count()); > return data_[idx]; > } > > private: > std::vector<T> data_; > }; > > Matrix expensive_to_construct_scratchspace; > firstprivate_emulation<Matrix> > data(expensive_to_construct_scratchspace); > for_each(par, 0, N, > [&](int i) > { > // access 'data' to access thread local copy of the outer > Matrix > Matrix& m = data.access(); > m[i][j] = ... > }); > > > Btw, I much like your idea of allocators for Numa locality. That's a > vast > > improvement over first touch, where you never really know who's the > > owner!! > > Heh, even if it uses first touch internally itself? :-) > > HTH > Regards Hartmut > --------------- > http://boost-spirit.com > http://stellar.cct.lsu.edu > > > > Regards > > Riccardo > > > > On 11 Sep 2016 7:23 p.m., "Hartmut Kaiser" <[email protected]> > > wrote: > > > > > first of all thank you very much for your quick and detailed answer. > > > Nevertheless i think i did not explain my concern. > > > using your code snippet, imagine i have > > > > > > > > > int nelements = 42; > > > Matrix expensive_to_construct_scratchspace > > > > > > for_each(par, 0, N, > > > [nelements, expensive_to_construct_scratchspace](int i) > > > { > > > // the captured 'nelements' is initialized from the outer > > > // variable and each copy of the lambda has its own > private > > > // copy > > > HERE as i understand the lambda vould capture by value my > > > "expensive_to_construct_scratchspace", which as i understand implies > > that > > > i would have one allocation per every "i". --> are u telling that this > > is > > > not the case? If so that would be a problem since constructing it > would > > be > > > very expensive. > > > > No, that would be the case, your analysis is correct. > > > > > On the contrary, if the lambda does not copy by value ... what if i > do > > > need that behaviour? > > > > > > note that i could definitely construct a blocked range of iterators > and > > > define a lambda acting on a given range of iterators, however that > would > > > be very very verbose... > > > > Looks like I misunderstood what firstprivate actually does... > > > > OTOH, in the openmp spec I read: > > > > firstprivate Specifies that each thread should have its own instance > > of > > a variable, and that the variable should be initialized with the > value > > of the variable, because it exists before the parallel construct. > > > > So each thread gets its own copy, which implies copying/allocation. What > > do I miss? > > > > If however you want to share the variable in between threads, just > capture > > it by reference: > > > > Matrix expensive_to_construct_scratchspace > > for_each(par, 0, N, > > [&expensive_to_construct_scratchspace](int i) > > { > > }); > > > > In this case you'd be responsible for making any operations on the > shared > > variable thread safe, however. > > > > Is that what you need? > > > > Regards Hartmut > > --------------- > > http://boost-spirit.com > > http://stellar.cct.lsu.edu > > > > > > > > > > > > > anyway, > > > thanks again for your attention > > > Riccardo > > > > > > > > > On Sun, Sep 11, 2016 at 4:48 PM, Hartmut Kaiser > > <[email protected]> > > > wrote: > > > Riccardo, > > > > > > > i am writing since i am an OpenMP user, but i am actually > > quite > > > > curious in understanding the future directions of c++. > > > > > > > > my parallel usage is actually relatively trivial, and is covered by > > > OpenMP > > > > 2.5 (openmp 3.1 with supports for iterators would be better but is > not > > > > available in msvc) > > > > 99% of my user needs are about parallel loops, and with c++11 > lambdas > > i > > > > could do a lot. > > > > > > Right. It is a fairly simple transformation in order to turn an OpenMP > > > parallel loop into the equivalent parallel algorithm. We specificly > > added > > > the parallel::for_loop() (not I the Parallelism TS/C++17) to support > > that > > > migration: > > > > > > #pragma omp parallel for > > > for(int i = 0; i != N; ++i) > > > { > > > // some iteration > > > } > > > > > > Would be equivalent to > > > > > > hpx::parallel::for_loop( > > > hpx::parallel::par, > > > 0, N, [](int i) > > > { > > > // some iteration > > > }); > > > > > > (for more information about for_loop() see here: http://www.open- > > > std.org/jtc1/sc22/wg21/docs/papers/2015/p0075r0.pdf) > > > > > > > However i am really not clear on how i should equivalently handle > > > > "private" and "firstprivate of OpenMP, which allow to create objects > > > that > > > > persist in the threadprivate memory during the whole lenght of a for > > > loop. > > > > I now use OpenMP 2.5 and i have a code that looks like the following > > > > > > > > > > > > > > https://kratos.cimne.upc.es/projects/kratos/repository/entry/kratos/kratos > > > > > > > > > > /solving_strategies/builder_and_solvers/residualbased_block_builder_and_so > > > > lver.h > > > > which does an openmp parallel Finite Element assembly. > > > > The code i am thinking of is somethign like: > > > > > > [snipped code] > > > > > > > the big question is ... how shall i handle the threadprivate > > > scratchspace > > > > in HPX?? Lambdas do not allow to do this ... > > > > that is, what is the equivalente of private & of firstprivate?? > > > > thanks you in advance for any clarification or pointer to examples > > > > > > For 'firstprivate' you can simply use lambda captures: > > > > > > int nelements = 42; > > > > > > for_each(par, 0, N, > > > [nelements](int i) > > > { > > > // the captured 'nelements' is initialized from the outer > > > // variable and each copy of the lambda has its own > private > > > // copy > > > // > > > // use private 'nelements' here: > > > cout << nelements << endl; > > > }); > > > > > > Note, that 'nelements' will be const by default. If you want to modify > > its > > > value, the lambda has to be made mutable: > > > > > > int nelements = 42; > > > > > > for_each(par, 0, N, > > > [nelements](int i) mutable // makes captures non-const > > > { > > > ++nelements; > > > }); > > > > > > Please don't be fooled however that this might give you one variable > > > instance per iteration. HPX runs several iterations 'in one go' > > (depending > > > on the partitioning, very much like openmp), so you will create one > > > variable instance per created partition. As long as you don't modify > the > > > variable this shouldn't make a difference, however. > > > > > > Emulating 'private' is even simpler. All you need is a local variable > > for > > > each iteration after all. Thus simply creating it on the stack inside > > the > > > lambda is the solution: > > > > > > for_loop(par, 0, N, [](int i) > > > { > > > // create 'private' variable > > > int my_private = 0; > > > // ... > > > }); > > > > > > This also gives you a hint on how you can have one instance of your > > > variable per iteration and still initialize it like it was > firstprivate: > > > > > > int nelements = 42; > > > for_loop(par, 0, N, [nelements](int i) > > > { > > > // create 'private' variable > > > int my_private = nelements; > > > // ... > > > ++my_private; // modifies instance for this iteration only. > > > }); > > > > > > Things become a bit more interesting if you need reductions. Please > see > > > the linked document above for more details, but here is a simple > example > > > (taken from that paper): > > > > > > float dot_saxpy(int n, float a, float x[], float y[]) > > > { > > > float s = 0; > > > for_loop(par, 0, n, > > > reduction(s, 0.0f, std::plus<float>()), > > > [&](int i, float& s_) > > > { > > > y[i] += a*x[i]; > > > s_ += y[i]*y[i]; > > > }); > > > return s; > > > } > > > > > > Here 's' is the reduction variable, and s_ is the thread-local > reference > > > to it. > > > > > > HTH > > > Regards Hartmut > > > --------------- > > > http://boost-spirit.com > > > http://stellar.cct.lsu.edu > > > > > > > > > > > > > > > -- > > > Riccardo Rossi > > > PhD, Civil Engineer > > > > > > member of the Kratos Team: www.cimne.com/kratos > > > Tenure Track Lecturer at Universitat Politècnica de Catalunya, > > > BarcelonaTech (UPC) > > > Full Research Professor at International Center for Numerical Methods > in > > > Engineering (CIMNE) > > > > > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9 > > > 08034 – Barcelona – Spain – www.cimne.com - > > > T.(+34) 93 401 56 96 skype: rougered4 > > > > > > > > > > > > Les dades personals contingudes en aquest missatge són tractades amb > la > > > finalitat de mantenir el contacte professional entre CIMNE i voste. > > Podra > > > exercir els drets d'accés, rectificació, cancel·lació i oposició, > > > dirigint-se a [email protected]. La utilització de la seva adreça de > > > correu electronic per part de CIMNE queda subjecte a les disposicions > de > > > la Llei 34/2002, de Serveis de la Societat de la Informació i el > Comerç > > > Electronic. > > > Imprimiu aquest missatge, només si és estrictament necessari. > > > > -- > Riccardo Rossi > PhD, Civil Engineer > > member of the Kratos Team: www.cimne.com/kratos > Tenure Track Lecturer at Universitat Politècnica de Catalunya, > BarcelonaTech (UPC) > Full Research Professor at International Center for Numerical Methods in > Engineering (CIMNE) > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9 > 08034 – Barcelona – Spain – www.cimne.com - > T.(+34) 93 401 56 96 skype: rougered4 > > > > Les dades personals contingudes en aquest missatge són tractades amb la > finalitat de mantenir el contacte professional entre CIMNE i voste. Podra > exercir els drets d'accés, rectificació, cancel·lació i oposició, > dirigint-se a [email protected]. La utilització de la seva adreça de > correu electronic per part de CIMNE queda subjecte a les disposicions de > la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç > Electronic. > Imprimiu aquest missatge, només si és estrictament necessari. > > > > -- > Riccardo Rossi > PhD, Civil Engineer > > member of the Kratos Team: www.cimne.com/kratos > Tenure Track Lecturer at Universitat Politècnica de Catalunya, > BarcelonaTech (UPC) > Full Research Professor at International Center for Numerical Methods in > Engineering (CIMNE) > > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9 > 08034 – Barcelona – Spain – www.cimne.com - > T.(+34) 93 401 56 96 skype: rougered4 > > > > Les dades personals contingudes en aquest missatge són tractades amb la > finalitat de mantenir el contacte professional entre CIMNE i voste. Podra > exercir els drets d'accés, rectificació, cancel·lació i oposició, > dirigint-se a [email protected]. La utilització de la seva adreça de > correu electronic per part de CIMNE queda subjecte a les disposicions de > la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç > Electronic. > Imprimiu aquest missatge, només si és estrictament necessari. _______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
