Re: [hpx-users] equivalent of firstprivate

Hartmut Kaiser Mon, 12 Sep 2016 05:18:07 -0700

> To my understanding an openmp for loop is expanded to smthg like
> Vector<data_type> private_data(nthreads)
> Here Copy data to the private data array . Once per thread
> For(int block_counter = 0; block_counter<blocks)
> {
>   For(begin, end, ...)
>    {
>               Here capture private_data[my_thread_id]
>                 ...Do work using the captured data
>     }
> 
> }
> This way the copying is done once per thread, not once per call nor once
> per block.
> Of course I could emulate this, if I had access to a function like
> Omp_get_thread_num() giving me the I'd of the current worker (I should
> also know the number of total workers to define the private_data array).
> Is this data available?
> Please do note that I am just a user, so my understanding of the specs may
> be faulty. My apologies if that s the case.


I think your understanding of OpenMP firstprivate is correct. Also you're 
right, the solution I gave will create one copy of the lambda per 
iteration-partition.

In order for having exactly one copy per kernel-thread you'd need to create a 
helper class which allocates the per-thread data. E.g. something like:

    #include <hpx/hpx.hpp>
    #include <vector>

    template <typename T>
    struct firstprivate_emulation
    {
        explicit firstprivate_emulation(T const& init)
          : data_(hpx::get_os_thread_count(), init)
        {
        }

        T& access()
        {
            std::size_t idx = hpx::get_worker_thread_num();
            HPX_ASSERT(idx < hpx::get_os_thread_count());
            return data_[idx];
        }

        T const& access() const
        {
            std::size_t idx = hpx::get_worker_thread_num();
            HPX_ASSERT(idx < hpx::get_os_thread_count());
            return data_[idx];
        }

    private:
        std::vector<T> data_;
    };

    Matrix expensive_to_construct_scratchspace;
    firstprivate_emulation<Matrix> data(expensive_to_construct_scratchspace);
    for_each(par, 0, N,
        [&](int i)
        {
            // access 'data' to access thread local copy of the outer Matrix
            Matrix& m = data.access();
            m[i][j] = ...
        });

> Btw, I much like your idea of allocators for Numa locality. That's a vast
> improvement over first touch, where you never really know who's the
> owner!!

Heh, even if it uses first touch internally itself? :-)

HTH
Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu


> Regards
> Riccardo
> 
> On 11 Sep 2016 7:23 p.m., "Hartmut Kaiser" <[email protected]>
> wrote:
> 
> > first of all thank you very much for your quick and detailed answer.
> > Nevertheless i think i did not explain my concern.
> > using your code snippet, imagine i have
> >
> >
> >     int nelements = 42;
> >     Matrix expensive_to_construct_scratchspace
> >
> >     for_each(par, 0, N,
> >         [nelements, expensive_to_construct_scratchspace](int i)
> >         {
> >             // the captured 'nelements' is initialized from the outer
> >             // variable and each copy of the lambda has its own private
> >             // copy
> > HERE as i understand the lambda vould capture by value my
> > "expensive_to_construct_scratchspace", which as i understand implies
> that
> > i would have one allocation per every "i". --> are u telling that this
> is
> > not the case? If so that would be a problem since constructing it would
> be
> > very expensive.
> 
> No, that would be the case, your analysis is correct.
> 
> > On the contrary, if the lambda does not copy by value ...  what if i do
> > need that behaviour?
> >
> > note that i could definitely construct a blocked range of iterators and
> > define a lambda acting on a given range of iterators, however that would
> > be very very verbose...
> 
> Looks like I misunderstood what firstprivate actually does...
> 
> OTOH, in the openmp spec I read:
> 
>     firstprivate Specifies that each thread should have its own instance
> of
>     a variable, and that the variable should be initialized with the value
>     of the variable, because it exists before the parallel construct.
> 
> So each thread gets its own copy, which implies copying/allocation. What
> do I miss?
> 
> If however you want to share the variable in between threads, just capture
> it by reference:
> 
>     Matrix expensive_to_construct_scratchspace
>     for_each(par, 0, N,
>         [&expensive_to_construct_scratchspace](int i)
>         {
>         });
> 
> In this case you'd be responsible for making any operations on the shared
> variable thread safe, however.
> 
> Is that what you need?
> 
> Regards Hartmut
> ---------------
> http://boost-spirit.com
> http://stellar.cct.lsu.edu
> 
> 
> >
> >
> > anyway,
> > thanks again for your attention
> > Riccardo
> >
> >
> > On Sun, Sep 11, 2016 at 4:48 PM, Hartmut Kaiser
> <[email protected]>
> > wrote:
> > Riccardo,
> >
> > >         i am writing since i am an OpenMP user, but i am actually
> quite
> > > curious in understanding the future directions of c++.
> > >
> > > my parallel usage is actually relatively trivial, and is covered by
> > OpenMP
> > > 2.5 (openmp 3.1 with supports for iterators would be better but is not
> > > available in msvc)
> > > 99% of my user needs are about parallel loops, and with c++11 lambdas
> i
> > > could do a lot.
> >
> > Right. It is a fairly simple transformation in order to turn an OpenMP
> > parallel loop into the equivalent parallel algorithm. We specificly
> added
> > the parallel::for_loop() (not I the Parallelism TS/C++17) to support
> that
> > migration:
> >
> >     #pragma omp parallel for
> >     for(int i = 0; i != N; ++i)
> >     {
> >         // some iteration
> >     }
> >
> > Would be equivalent to
> >
> >     hpx::parallel::for_loop(
> >         hpx::parallel::par,
> >         0, N, [](int i)
> >         {
> >             // some iteration
> >         });
> >
> > (for more information about for_loop() see here: http://www.open-
> > std.org/jtc1/sc22/wg21/docs/papers/2015/p0075r0.pdf)
> >
> > > However i am really not clear on how i should equivalently handle
> > > "private" and "firstprivate of OpenMP, which allow to create objects
> > that
> > > persist in the threadprivate memory during the whole lenght of a for
> > loop.
> > > I now use OpenMP 2.5 and i have a code that looks like the following
> > >
> > >
> >
> https://kratos.cimne.upc.es/projects/kratos/repository/entry/kratos/kratos
> > >
> >
> /solving_strategies/builder_and_solvers/residualbased_block_builder_and_so
> > > lver.h
> > > which does an openmp parallel Finite Element assembly.
> > > The code i am thinking of is somethign like:
> >
> > [snipped code]
> >
> > > the big question is ... how shall i handle the threadprivate
> > scratchspace
> > > in HPX?? Lambdas do not allow to do this ...
> > > that is, what is the equivalente of private & of firstprivate??
> > > thanks you in advance for any clarification or pointer to examples
> >
> > For 'firstprivate' you can simply use lambda captures:
> >
> >     int nelements = 42;
> >
> >     for_each(par, 0, N,
> >         [nelements](int i)
> >         {
> >             // the captured 'nelements' is initialized from the outer
> >             // variable and each copy of the lambda has its own private
> >             // copy
> >             //
> >             // use private 'nelements' here:
> >             cout << nelements << endl;
> >         });
> >
> > Note, that 'nelements' will be const by default. If you want to modify
> its
> > value, the lambda has to be made mutable:
> >
> >     int nelements = 42;
> >
> >     for_each(par, 0, N,
> >         [nelements](int i) mutable // makes captures non-const
> >         {
> >             ++nelements;
> >         });
> >
> > Please don't be fooled however that this might give you one variable
> > instance per iteration. HPX runs several iterations 'in one go'
> (depending
> > on the partitioning, very much like openmp), so you will create one
> > variable instance per created partition. As long as you don't modify the
> > variable this shouldn't make a difference, however.
> >
> > Emulating 'private' is even simpler. All you need is a local variable
> for
> > each iteration after all. Thus simply creating it on the stack inside
> the
> > lambda is the solution:
> >
> >     for_loop(par, 0, N, [](int i)
> >     {
> >         // create 'private' variable
> >         int my_private = 0;
> >         // ...
> >     });
> >
> > This also gives you a hint on how you can have one instance of your
> > variable per iteration and still initialize it like it was firstprivate:
> >
> >     int nelements = 42;
> >     for_loop(par, 0, N, [nelements](int i)
> >     {
> >         // create 'private' variable
> >         int my_private = nelements;
> >         // ...
> >         ++my_private;   // modifies instance for this iteration only.
> >     });
> >
> > Things become a bit more interesting if you need reductions. Please see
> > the linked document above for more details, but here is a simple example
> > (taken from that paper):
> >
> >     float dot_saxpy(int n, float a, float x[], float y[])
> >     {
> >         float s = 0;
> >         for_loop(par, 0, n,
> >             reduction(s, 0.0f, std::plus<float>()),
> >             [&](int i, float& s_)
> >             {
> >                 y[i] += a*x[i];
> >                 s_ += y[i]*y[i];
> >             });
> >         return s;
> >     }
> >
> > Here 's' is the reduction variable, and s_ is the thread-local reference
> > to it.
> >
> > HTH
> > Regards Hartmut
> > ---------------
> > http://boost-spirit.com
> > http://stellar.cct.lsu.edu
> >
> >
> >
> >
> > --
> > Riccardo Rossi
> > PhD, Civil Engineer
> >
> > member of the Kratos Team: www.cimne.com/kratos
> > Tenure Track Lecturer at Universitat Politècnica de Catalunya,
> > BarcelonaTech (UPC)
> > Full Research Professor at International Center for Numerical Methods in
> > Engineering (CIMNE)
> >
> > C/ Gran Capità, s/n, Campus Nord UPC, Ed. C1, Despatx C9
> > 08034 – Barcelona – Spain – www.cimne.com  -
> > T.(+34) 93 401 56 96 skype: rougered4
> >
> >
> >
> > Les dades personals contingudes en aquest missatge són tractades amb la
> > finalitat de mantenir el contacte professional entre CIMNE i voste.
> Podra
> > exercir els drets d'accés, rectificació, cancel·lació i oposició,
> > dirigint-se a [email protected]. La utilització de la seva adreça de
> > correu electronic per part de CIMNE queda subjecte a les disposicions de
> > la Llei 34/2002, de Serveis de la Societat de la Informació i el Comerç
> > Electronic.
> >  Imprimiu aquest missatge, només si és estrictament necessari.

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] equivalent of firstprivate

Reply via email to