Re: CI failures due to JMX client heartbeat thread

Kirk Lund Thu, 27 Oct 2016 13:02:43 -0700

The "JMX client heartbeat n" threads are created
by com.sun.jmx.remote.internal.ClientCommunicatorAdmin.

GFSH registers JMXConnectionListener which is a non-inner class hiding
inside 
./geode-core/src/main/java/org/apache/geode/management/internal/cli/shell/JmxOperationInvoker.java
-- we should extract all such classes to their own java file or make then
static inner classes. JMXConnectionListener
implements javax.management.NotificationListener and is registered
with JMXConnector.

"gfshFileLogger.severe" is the line in notifyDisconnect that is causing
these fatal level log messages.

It's not clear to me what would cause GFSH to keep the JDK's "JMX client
heartbeat n" threads around (I haven't actually seen this yet). Maybe the
way JmxOperationInvoker is implemented is incorrect and leaks or keeps some
JMX resources around. GFSH does keep a reference to JmxOperationInvoker and
I don't see that cleared anywhere (such as when it might disconnect from a
JMX server).

It seems reasonable to lower the log level for the log messages (I hate to
suppress it entirely though). Beyond that we should file a bug for the
leaking of "JMX client heartbeat n" threads. I suspect leaking these
threads would mostly occur in tests that are connecting/disconnecting
a GFSH instance repeatedly. The typical user of GFSH isn't going to
repeatedly connect/disconnect like this.

Maybe the tests that are hitting this are closing the manager 1st and then
GFSH's JMXConnectionListener detects this but then never frees the
JMXConnectionListener or JmxOperationInvoker.

-Kirk

On Thu, Oct 27, 2016 at 11:46 AM, Bruce Schuchardt <bschucha...@pivotal.io>
wrote:

> I think the code is in Gfsh.notifyDisconnect().  I think the test that's
> polluting others is ConnectToLocatorSSLDUnitTest.  I had a failure with
> this suspect string in a test that ran just after it. I added a thread dump
> in the @After of that test and see that it's leaving one of these threads
> behind along with some Gfsh Launcher threads.
>
>
> Le 10/27/2016 à 10:54 AM, Kirk Lund a écrit :
>
>> Bruce, can you please point me at the JDK code you think is generating the
>> fatal message? I looked in RMIConnector.java and can't find it in there.
>>
>> Maybe we can fix ordering of tearDown or something else in order to "fix"
>> this message.
>>
>> Thanks,
>> Kirk
>>
>>
>> On Wed, Oct 26, 2016 at 5:00 PM, Dan Smith <dsm...@pivotal.io> wrote:
>>
>> Just looking at a couple of these bugs, these are fatal level errors. Do
>>> you know what is causing them or what affect this might have?
>>>
>>> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
>>> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
>>>
>>> -Dan
>>>
>>> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <
>>> bschucha...@pivotal.io>
>>> wrote:
>>>
>>> We have 23 CI failure tickets concerning suspect strings from "JMX client
>>>> heartbeat" threads.  I think we should add this to ExpectedStrings.java
>>>>
>>> and
>>>
>>>> close these tickets.  The suspect strings are coming from
>>>> javax.management.remote.rmi.RMIConnector and I don't think there's
>>>> anything we can do about them.  Most, if not all, are assigned to
>>>>
>>> security
>>>
>>>> or JMX components.
>>>>
>>>>
>>>>     GEODE-1858
>>>>     GEODE-1955
>>>>     GEODE-1492
>>>>     GEODE-1539
>>>>     GEODE-1538
>>>>     GEODE-1773
>>>>     GEODE-1922
>>>>     GEODE-1482
>>>>     GEODE-1475
>>>>     GEODE-1480
>>>>     GEODE-1481
>>>>     GEODE-1826
>>>>     GEODE-1825
>>>>     GEODE-1820
>>>>     GEODE-1878
>>>>     GEODE-1879
>>>>     GEODE-1877
>>>>     GEODE-1875
>>>>     GEODE-1876
>>>>     GEODE-1499
>>>>     GEODE-1476
>>>>     GEODE-1869
>>>>     GEODE-1769
>>>>
>>>>
>>>>
>

Re: CI failures due to JMX client heartbeat thread

Reply via email to