On Thu, Oct 22, 2015 at 07:16:49PM +0200, Bernd Schmidt wrote:
> I'm not really familiar with OpenMP and what it allows, so take all my
> comments with a grain of salt.
> 
> On 10/22/2015 06:41 PM, Alexander Monakov wrote:
> >The second approach is to run all threads in the warp all the time, making
> >sure they execute the same code with the same data, and thus build up the 
> >same
> >local state.
> 
> But is that equivalent? If each thread takes the address of a variable on
> its own stack, that's not the same as taking an address once and
> broadcasting it.

BTW, does it consume more energy if all threads in the warp in a lock step
do the same thing, vs. just the first one doing something and all the others
neuterized?  What about stores to global or shared memory if done in lock step 
by
multiple threads in the warp?

        Jakub

Reply via email to