On Apr 10, 2013, at 1:18 PM, "Kirk, Benjamin (JSC-EG311)" <benjamin.kir...@nasa.gov> wrote:
>> >> Anyone sleep on this and come up with any ideas to try? > > I'm reviewing the code now… Is there a restart file with this case, or is it > a fresh start? I'm curious if we have a good-old-fasioned race condition here. We are locking at the same section of code called from two places, suggesting a synchronization problem. Now, there is some allgather() in there, which I would expect to force synchronization, so this is indeed curious… One idea: do we need a simple barrier() at the end of that function to avoid synchronization issues? Perhaps one set of processors is racing ahead and accidentally participating in the next allgather(), that is not actually intended for it?? Barring that, my only other idea is that the sorting is somehow breaking down when a processor has no objects, but pretty sure we've tested the heck out of that. So I'd say first off try a simple barrier() before that function returns and report back… -Ben ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel