Hi Developers,

I found a ZooKeeper related OOME when deploying a Spring Boot + Dubbo + 
ZooKeeper JavaEE application, it occur rarely. I got a heap.bin and analyzed 
with Eclipse Memory Analyzer, there's almost 36 millions nodes of 
ClientCnxn$EventThread.waitingEvents, nodes' item are the same eventOfDeath 
object.
In ClientCnxn.EventThread, there is:
    private Object eventOfDeath = new Object();
        public void queueEventOfDeath() {
            waitingEvents.add(eventOfDeath);
        }
queueEventOfDeath() is the only way to add eventOfDeath, it seems 
queueEventOfDeath() was still invoked when wasKilled is true and the addition 
is not filtered.


ZooKeeper version is 3.4.12, both server and client.
The OOME application is a high level service, and there's a low level service 
application which is a dependency of high level service, they are all single 
deployment, no duplicated one. They were deployed almost at the same time.


I searched Jira issues and not found the same one, any helps?


Attachments:
==
Some MAT result:
1) From dominator_tree:
Class Name                                                                      
               | Shallow Heap | Retained Heap | Percentage
-------------------------------------------------------------------------------------------------------------------------------------------
org.apache.curator.CuratorZookeeperClient @ 0xc4312678                          
               |           40 |   868,942,136 |     88.54%
|- org.apache.curator.ConnectionState @ 0xc43126a0                              
               |           64 |   868,941,928 |     88.54%
|  |- org.apache.curator.HandleHolder @ 0xc43126e0                              
               |           40 |   868,941,080 |     88.54%
|  |  |- org.apache.curator.HandleHolder$1$1 @ 0xc82cfc60                       
               |           16 |   868,941,016 |     88.54%
|  |  |  '- org.apache.curator.HandleHolder$1 @ 0xc82cfc70                      
               |           24 |   868,941,000 |     88.54%
|  |  |     '- org.apache.zookeeper.ZooKeeper @ 0xc82cfc88                      
               |           24 |   868,940,976 |     88.54%
|  |  |        |- org.apache.zookeeper.ClientCnxn @ 0xc82cfca0                  
               |          104 |   868,940,680 |     88.54%
|  |  |        |  |- org.apache.zookeeper.ClientCnxn$EventThread @ 0xc82d2800  
main-EventThread|          144 |   868,872,040 |     88.53%
|  |  |        |  |  |- java.util.concurrent.LinkedBlockingQueue @ 0xc82d29e0   
               |           48 |   868,871,816 |     88.53%
-------------------------------------------------------------------------------------------------------------------------------------------
2) ClientCnxn$EventThread  -> list objects -> outgoing:
Class Name                                                                      
                          | Shallow Heap | Retained Heap
-----------------------------------------------------------------------------------------------------------------------------------------
org.apache.zookeeper.ClientCnxn$EventThread @ 0xc82d2800  main-EventThread      
                          |          144 |   868,872,040
|- waitingEvents java.util.concurrent.LinkedBlockingQueue @ 0xc82d29e0          
                          |           48 |   868,871,816
|  |- count java.util.concurrent.atomic.AtomicInteger @ 0xc82d2a10  36202972    
                          |           16 |            16
|  |- head java.util.concurrent.LinkedBlockingQueue$Node @ 0xc82d2a20           
                          |           24 |   868,871,552
|  |  |- next java.util.concurrent.LinkedBlockingQueue$Node @ 0xc82d2a38        
                          |           24 |   868,871,528
|  |  |  |- item org.apache.zookeeper.ClientCnxn$WatcherSetEventPair @ 
0xc82d2a50                         |           24 |           224
|  |  |  |- next java.util.concurrent.LinkedBlockingQueue$Node @ 0xc82d2b30     
                          |           24 |   868,871,280
|  |  |  |  |- item java.lang.Object @ 0xc82d2b48                               
                          |           16 |            16
|  |  |  |  |- next java.util.concurrent.LinkedBlockingQueue$Node @ 0xc82d2b58  
                          |           24 |   868,871,256
|  |  |  |  |  |- item java.lang.Object @ 0xc82d2b48                            
                          |           16 |            16
|  |  |  |  |  |- next java.util.concurrent.LinkedBlockingQueue$Node @ 
0xc82d2b70                         |           24 |   868,871,232
|  |  |  |  |  |  |- item java.lang.Object @ 0xc82d2b48                         
                          |           16 |            16
|  |  |  |  |  |  |- next java.util.concurrent.LinkedBlockingQueue$Node @ 
0xc82d2b88                      |           24 |   868,871,208
-----------------------------------------------------------------------------------------------------------------------------------------
3) Some ClientCnxn instance fields:
Class Name                                                                      
                           | Shallow Heap | Retained Heap
------------------------------------------------------------------------------------------------------------------------------------------
org.apache.zookeeper.ClientCnxn @ 0xc82cfca0                                    
                           |          104 |   868,940,680
|- <class> class org.apache.zookeeper.ClientCnxn @ 0xc44deaa8                   
                           |           24 |           128
|- eventThread org.apache.zookeeper.ClientCnxn$EventThread @ 0xc82d2800  
main-EventThread                  |          144 |   868,872,040
|- eventOfDeath java.lang.Object @ 0xc82d2b48                                   
                           |           16 |            16
------------------------------------------------------------------------------------------------------------------------------------------
4) Calculate eventOfDeath referenced count:
SELECT n.item, n.item.@objectId, n.item.hashCode() FROM 
java.util.concurrent.LinkedBlockingQueue$Node n
n.item                          | n.item.@objectId | n.item.hashCode()
-----------------------------------------------------------------------
java.lang.Object [id=0xc82d2b48]|        2,516,148 |         2,516,148
-----------------------------------------------------------------------
SELECT n.item FROM java.util.concurrent.LinkedBlockingQueue$Node n WHERE 
(n.item.@objectId = 2516148)
n.item
--------------------------------
java.lang.Object [id=0xc82d2b48]
--------------------------------
Total: 20 of 36,201,971 entries


==
Some of biz machine log:


2019-08-07 16:04:22 [WARN] [.154:2181)] [o.a.z.ClientCnxn][1111]]  Client 
session timed out, have not heard from server in 69261ms for sessionid 
0x100000048ef0156
2019-08-07 16:04:22 [INFO] [.154:2181)] [o.a.z.ClientCnxn][1159]]  Client 
session timed out, have not heard from server in 69261ms for sessionid 
0x100000048ef0156, closing socket connection and attempting reconnect
2019-08-07 16:04:33 [INFO] [.154:2181)] [o.a.z.ClientCnxn][1028]]  Opening 
socket connection to server 10.111.196.154/10.111.196.154:2181. Will not 
attempt to authenticate using SASL (unknown error)
2019-08-07 16:04:33 [INFO] [.154:2181)] [o.a.z.ClientCnxn][878]]  Socket 
connection established to 10.111.196.154/10.111.196.154:2181, initiating session
2019-08-07 16:04:39 [WARN] [.154:2181)] [o.a.z.ClientCnxn][1288]]  Unable to 
reconnect to ZooKeeper service, session 0x100000048ef0156 has expired
2019-08-07 16:04:39 [INFO] [.154:2181)] [o.a.z.ClientCnxn][1157]]  Unable to 
reconnect to ZooKeeper service, session 0x100000048ef0156 has expired, closing 
socket connection
...
2019-08-07 16:04:44 [ERROR] [ventThread] [o.a.c.f.i.CuratorFrameworkImpl][685]] 
 Watcher exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /dubbo/mycom.auth.api.UserAccountService/providers
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1653)
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
        at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
        at 
org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
        at 
com.alibaba.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient$CuratorWatcherImpl.process(CuratorZookeeperClient.java:159)
        at 
org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:83)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:532)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)
...
2019-08-07 16:04:49 [ERROR] [ventThread] [o.a.z.ClientCnxn][534]]  Error while 
calling watcher
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
        at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
        at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
        at java.lang.StringBuilder.append(StringBuilder.java:136)
        at 
com.alibaba.dubbo.registry.support.AbstractRegistry.notify(AbstractRegistry.java:380)
        at 
com.alibaba.dubbo.registry.support.FailbackRegistry.doNotify(FailbackRegistry.java:273)
        at 
com.alibaba.dubbo.registry.support.FailbackRegistry.notify(FailbackRegistry.java:259)
        at 
com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry.access$400(ZookeeperRegistry.java:43)
        at 
com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry$3.childChanged(ZookeeperRegistry.java:171)
        at 
com.alibaba.dubbo.remoting.zookeeper.curator.CuratorZookeeperClient$CuratorWatcherImpl.process(CuratorZookeeperClient.java:159)
        at 
org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:83)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:532)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)


==
Some of zookeeper server log (/zookeeper.out)
2019-08-07 16:05:21,434 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when 
processing sessionid:0x100000048ef015a type:create cxid:0x1a zxid:0x14fa1 
txntype:-1 reqpath:n/a Error 
Path:/dubbo/mycom.auth.api.EmailInfoService/routers Error:KeeperErrorCode = 
NodeExists for /dubbo/mycom.auth.api.EmailInfoService/routers
2019-08-07 16:05:31,024 [myid:] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read 
additional data from client sessionid 0x100000048ef015a, likely client has 
closed socket
2019-08-07 16:05:31,025 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket 
connection for client /10.111.196.155:45044 which had sessionid 
0x100000048ef015a
2019-08-07 16:05:49,251 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted 
socket connection from /10.111.196.155:45150
2019-08-07 16:05:49,262 [myid:] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@948] - Client 
attempting to establish new session at /10.111.196.155:45150
2019-08-07 16:05:49,264 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@693] - 
Established session 0x100000048ef015b with negotiated timeout 60000 for client 
/10.111.196.155:45150
2019-08-07 16:05:49,377 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when 
processing sessionid:0x100000048ef015b type:create cxid:0x4 zxid:0x14fa4 
txntype:-1 reqpath:n/a Error 
Path:/dubbo/mycom.auth.api.UserAccountService/providers Error:KeeperErrorCode = 
NodeExists for /dubbo/mycom.auth.api.UserAccountService/providers
2019-08-07 16:05:49,397 [myid:] - INFO  [ProcessThread(sid:0 
cport:2181)::PrepRequestProcessor@653] - Got user-level KeeperException when 
processing sessionid:0x100000048ef015b type:create cxid:0x7 zxid:0x14fa5 
txntype:-1 reqpath:n/a Error 
Path:/dubbo/mycom.auth.api.UserAccountService/configurators 
Error:KeeperErrorCode = NodeExists for 
/dubbo/mycom.auth.api.UserAccountService/configurators



Reply via email to