> On March 9, 2016, 9:51 p.m., Dan Smith wrote:
> > geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java,
> >  line 954
> > <https://reviews.apache.org/r/44587/diff/1/?file=1293564#file1293564line954>
> >
> >     This seems like a hack to have the product code look for a test class???
> >     
> >     It would be better to have the product code have a callback that can be 
> > installed by the test code. Some forceDisconnectListener maybe?

I understand where you're coming from Dan, but the problem we're facing is that 
random regression tests and random CI runs are encountering forced-disconnect 
exceptions due to JVMs pausing.  These failures are being assigned to 
"membership" though we suspect something else is happening to generate lots of 
garbage and we need more artifacts to find the real source of the problem.

We need to have this code active in all of the tests so we can track down 
what's going wrong and we don't have the manpower to modify all of the tests to 
install a test callback that generates the heap dump.  We also can't modify the 
regression framework to do this because often the JVM in question has exited 
when post-mortem stack traces are generated.

Once we've figured this problem out I will be removing this heap-dump code from 
InternalDistributedSystem.


- Bruce


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44587/#review122816
-----------------------------------------------------------


On March 9, 2016, 7:27 p.m., Bruce Schuchardt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44587/
> -----------------------------------------------------------
> 
> (Updated March 9, 2016, 7:27 p.m.)
> 
> 
> Review request for geode, Hitesh Khamesra, Jianxia Chen, and Udo Kohlmeyer.
> 
> 
> Repository: geode
> 
> 
> Description
> -------
> 
> We're seeing a number of similar failures that all seem to be caused by JVMs 
> pausing and being kicked out of the distributed system.   This change-set 
> enables creation of a heap dump if a member is forced out of the system and 
> JVM pauses have been detected.  This will give us artifacts that we can 
> analyze to help determine what's going on.
> 
> 
> Diffs
> -----
> 
>   
> geode-core/src/main/java/com/gemstone/gemfire/distributed/internal/InternalDistributedSystem.java
>  a19369942ba691eef8d021fd7d8a537939cf2cc8 
> 
> Diff: https://reviews.apache.org/r/44587/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Bruce Schuchardt
> 
>

Reply via email to