Hi Kishore, Please find below log and I think the issue is "Packet len4194362 is out of range!". Currently we have around 10000 unprocessed workflows. Will that be the reason for that?
2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (SyncConnected) 2019-05-31 12:17:15,237 [Thread-0-SendThread(localhost:2181)] WARN o.apache.zookeeper.ClientCnxn - Session 0x16b0e59877f0005 for server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4194362 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) 2019-05-31 12:17:15,337 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (Disconnected) 2019-05-31 12:17:15,523 [Thread-0-SendThread(localhost:2181)] INFO o.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181 2019-05-31 12:17:15,524 [Thread-0-SendThread(localhost:2181)] INFO o.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2181, initiating session 2019-05-31 12:17:15,525 [Thread-0-SendThread(localhost:2181)] INFO o.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x16b0e59877f0005, negotiated timeout = 30000 2019-05-31 12:17:15,526 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (SyncConnected) 2019-05-31 12:17:15,540 [Thread-0-SendThread(localhost:2181)] WARN o.apache.zookeeper.ClientCnxn - Session 0x16b0e59877f0005 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4194362 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041) 2019-05-31 12:17:15,641 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (Disconnected) Thanks Dimuthu On Fri, May 31, 2019 at 11:55 AM kishore g <g.kish...@gmail.com> wrote: > can you grep for zookeeper state in controller log. > > On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha < > dimuthu.upeks...@gmail.com> > wrote: > > > Hi Folks, > > > > I'm getting following error in controller log and seems like controller > is > > not moving froward after that point > > > > 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - > > Starting helix controller > > 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - > > Settings loaded from > > > > > file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties > > 2019-05-31 10:47:37,091 [Thread-0] INFO o.a.a.h.i.c.HelixController - > > Connection to helix cluster : AiravataDemoCluster with name : > > helixcontroller2 > > 2019-05-31 10:47:37,092 [Thread-0] INFO o.a.a.h.i.c.HelixController - > > Zookeeper connection string localhost:2181 > > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR > > o.a.h.c.GenericHelixController - Exception while executing > > DEFAULTpipeline: > org.apache.helix.controller.pipeline.Pipeline@408d6d26for > > cluster .AiravataDemoCluster. Will not continue to next pipeline > > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to > get > > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT > > at > > > > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446) > > at > > > > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406) > > at > > > > > org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467) > > at > > > > > org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176) > > at > > > > > org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62) > > at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63) > > at > > > > > org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432) > > at > > > > > org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928) > > Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException: > > Fail to read nodes for > > [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant] > > at > > > > > org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414) > > at > > > > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479) > > at > > > > > org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442) > > ... 7 common frames omitted > > > > In the zookeeper log I can see following warning getting printed > > continuously. What could be the reason for that? I'm using helix 0.8.2 > and > > zookeeper 3.4.8 > > > > 2019-05-31 10:49:37,621 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for > > client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f0000 > > 2019-05-31 10:49:37,773 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket > > connection > > from /127.0.0.1:57984 > > 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew > > session 0x16b0e59877f0000 at /127.0.0.1:57984 > > 2019-05-31 10:49:37,774 [myid:] - INFO [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session > > 0x16b0e59877f0000 with negotiated timeout 30000 for client / > > 127.0.0.1:57984 > > 2019-05-31 10:49:37,790 [myid:] - WARN [NIOServerCxn.Factory: > > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception > > EndOfStreamException: Unable to read additional data from client > sessionid > > 0x16b0e59877f0000, likely client has closed socket > > at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230) > > at > > > > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203) > > at java.lang.Thread.run(Thread.java:748) > > > > Thanks > > Dimuthu > > >