>> Ha - I'm not the only one doing libMesh stuff on a Friday night, huh? >> >> Any thoughts on our default and overridden threading grainsizes? I'm >> playing with them on MIC, and even for a 2D problem with Q2/Q1 >> Navier-Stokes, I can go from 1000 all the way down to 100 with no >> performance penalty. Even going down to 10 elements in the smallest >> range only adds about 10% overhead worst case. >> >> I'm wondering if I ought to just check n_local_elem() each time we >> build a range and pick a grain size of n_local_elem()/n_threads()/N >> for some small integer N. > > And therein lies the rub - what to pick?? > > We had some issues with the original implementation because our predicated > iterators are not random access, so splitting a range was nontrivial. I > can't remember if anything in the underlying storage would change to make > this better, but 10 is surprising and certainly would be no good for linear > triangles on a scalar problem.
And I think it depends too on the total number of elements versus how small you are splitting. Splitting 10,000 into a chunksize of 10 is not as bad as 1,000,000 I seem to recall. But anyway, I think the environment variable approach would be pretty cool. -Ben ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel