Hi devs, So, while working on switching from OMP to BLI_task, I've hit today an rather hairy issue in pbvh_update_normals() (in BKE's pbvh.c).
Current OMP-based code of that function executes in (roughly) 28ms. However, reading it, it's obvious it's lacking a `#pragma omp critical` section around main part of the second OMP loop, since affected MVert may be used by several nodes, and hence suffer concurrency here. Now, if I add that critical section, code now takes over 100ms to run! Changing to BLI_task parrallelized code, with same correct protection (using a spinlock) I can get about 70ms. Using some gcc atomic op (__sync_fetch_and_and), since the atomic ME_VERT_PBVH_UPDATE flag checking and clearing is enough to prevent concurrency here, it can go down to 20ms. Sad part of the story: because MVert->flag is and 8bit var, there is no clean way to do the same under OSX, we can only hack around OSAtomicTestAndClear but it's far from nice and clean (need to retrieve bit number from bitmask eg.). Now comes the funny question: since the probability of thread concurrency here is very low (most vertices are used by only one bvhnode, and the 'fragile' part of the code is *very* short, two bitflag operations), do we want to care about absolute safety of computed normals here? Missing 'critical section' in second loop never hurted us so far it seems... Need your thoughts here, before I loose more time on this hell. ;) _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
