Thanks guys, thats globally good news. :) For now will just add needed bit to our atomic_ops.h, but yes, think we should definitively think about updating it...
Le 27/01/2016 10:03, Sergey Sharybin a écrit : > Sure answer is: we have to be thread-safe, otherwise it's just waiting for > nasty heisenbug to happen. > > Now, our atomic_ops.h is based on Jemalloc from 3 years ago. There are some > interesting twists happened in the upstream's atomic.h supporting C11 > atomics and such. We can totally: > > 1. Optionally update our atomic ops > 2. Switch away from OSAtomic on OSX and use different implementation > > On Tue, Jan 26, 2016 at 11:49 PM, Brecht Van Lommel < > [email protected]> wrote: > >> It's fine to use __sync_fetch_and_and on OS X, no need to use >> OSAtomic* functions. Using 8 bit __sync_fetch_and_and on OS X compiles >> fine for me. >> >> OpenImageIO removed OSAtomic* almost 3 years ago, and we've had no >> problem with that: >> >> https://github.com/OpenImageIO/oiio/commit/14d5d8e572322e5e0decdd1b7d3f4d0484819473 >> >> On Tue, Jan 26, 2016 at 9:57 PM, Bastien Montagne <[email protected]> >> wrote: >>> Hi devs, >>> >>> So, while working on switching from OMP to BLI_task, I've hit today an >>> rather hairy issue in pbvh_update_normals() (in BKE's pbvh.c). >>> >>> Current OMP-based code of that function executes in (roughly) 28ms. >>> However, reading it, it's obvious it's lacking a `#pragma omp critical` >>> section >>> around main part of the second OMP loop, since affected MVert may be >>> used by several nodes, and hence suffer concurrency here. >>> >>> Now, if I add that critical section, code now takes over 100ms to run! >>> >>> Changing to BLI_task parrallelized code, with same correct protection >>> (using a spinlock) I can get about 70ms. Using some gcc atomic op >>> (__sync_fetch_and_and), since the atomic ME_VERT_PBVH_UPDATE flag >>> checking and clearing is enough to prevent concurrency here, >>> it can go down to 20ms. >>> >>> Sad part of the story: because MVert->flag is and 8bit var, there is no >>> clean way to do the same under OSX, we can only hack around >>> OSAtomicTestAndClear but it's far from nice and clean (need to retrieve >>> bit number from bitmask eg.). >>> >>> Now comes the funny question: since the probability of thread >>> concurrency here is very low (most vertices are used by only one bvhnode, >>> and the 'fragile' part of the code is *very* short, two bitflag >>> operations), do we want to care about absolute safety of computed >>> normals here? >>> Missing 'critical section' in second loop never hurted us so far it >> seems... >>> Need your thoughts here, before I loose more time on this hell. ;) >>> _______________________________________________ >>> Bf-committers mailing list >>> [email protected] >>> http://lists.blender.org/mailman/listinfo/bf-committers >> _______________________________________________ >> Bf-committers mailing list >> [email protected] >> http://lists.blender.org/mailman/listinfo/bf-committers >> > > _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
