On Apr 10, 2013, at 1:18 PM, "Kirk, Benjamin (JSC-EG311)" 
<benjamin.kir...@nasa.gov> wrote:

>> 
>> Anyone sleep on this and come up with any ideas to try?
> 
> I'm reviewing the code now…  Is there a restart file with this case, or is it 
> a fresh start?

I'm curious if we have a good-old-fasioned race condition here.  We are locking 
at the same section of code called from two places, suggesting a 
synchronization problem.

Now, there is some allgather() in there, which I would expect to force 
synchronization, so this is indeed curious…

One idea:  do we need a simple barrier() at the end of that function to avoid 
synchronization issues?  Perhaps one set of processors is racing ahead and 
accidentally participating in the next allgather(), that is not actually 
intended for it??

Barring that, my only other idea is that the sorting is somehow breaking down 
when a processor has no objects, but pretty sure we've tested the heck out of 
that.

So I'd say first off try a simple barrier() before that function returns and 
report back…

-Ben



------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Libmesh-devel mailing list
Libmesh-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libmesh-devel

Reply via email to