Hi Kishore,

Adding -Djute.maxbuffer=49107800 fixed the issue but now I can see a whole
lot of logs printing with following line and participant is executing a
bulk of Tasks once in a while with around 5 minute delay in between.

2019-05-31 12:45:58,804 [GenericHelixController-event_process] WARN
 o.a.h.t.a.AssignableInstance  - AssignableInstance does not have enough
capacity for quotaType: DEFAULT. Task:
Task_TASK_57f87021-bedb-415a-bf1f-ddfd17641f11, quotaType: DEFAULT,
Instance name: helixparticipant3. Current capacity: 40 capacity needed to
schedule: 40

Thanks
Dimuthu

On Fri, May 31, 2019 at 12:25 PM DImuthu Upeksha <dimuthu.upeks...@gmail.com>
wrote:

> Hi Kishore,
>
> Please find below log and I think the issue is "Packet len4194362 is out
> of range!". Currently we have around 10000 unprocessed workflows. Will that
> be the reason for that?
>
> 2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (SyncConnected)
> 2019-05-31 12:17:15,237 [Thread-0-SendThread(localhost:2181)] WARN
>  o.apache.zookeeper.ClientCnxn  - Session 0x16b0e59877f0005 for server
> localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing socket connection
> and attempting reconnect
> java.io.IOException: Packet len4194362 is out of range!
> at
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
> 2019-05-31 12:17:15,337 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (Disconnected)
> 2019-05-31 12:17:15,523 [Thread-0-SendThread(localhost:2181)] INFO
>  o.apache.zookeeper.ClientCnxn  - Opening socket connection to server
> localhost/127.0.0.1:2181
> 2019-05-31 12:17:15,524 [Thread-0-SendThread(localhost:2181)] INFO
>  o.apache.zookeeper.ClientCnxn  - Socket connection established to
> localhost/127.0.0.1:2181, initiating session
> 2019-05-31 12:17:15,525 [Thread-0-SendThread(localhost:2181)] INFO
>  o.apache.zookeeper.ClientCnxn  - Session establishment complete on server
> localhost/127.0.0.1:2181, sessionid = 0x16b0e59877f0005, negotiated
> timeout = 30000
> 2019-05-31 12:17:15,526 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (SyncConnected)
> 2019-05-31 12:17:15,540 [Thread-0-SendThread(localhost:2181)] WARN
>  o.apache.zookeeper.ClientCnxn  - Session 0x16b0e59877f0005 for server
> localhost/127.0.0.1:2181, unexpected error, closing socket connection and
> attempting reconnect
> java.io.IOException: Packet len4194362 is out of range!
> at
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)
> 2019-05-31 12:17:15,641 [Thread-0-EventThread] INFO
>  o.a.h.m.zk.zookeeper.ZkClient  - zookeeper state changed (Disconnected)
>
> Thanks
> Dimuthu
>
> On Fri, May 31, 2019 at 11:55 AM kishore g <g.kish...@gmail.com> wrote:
>
>> can you grep for zookeeper state in controller log.
>>
>> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com>
>> wrote:
>>
>> > Hi Folks,
>> >
>> > I'm getting following error in controller log and seems like controller
>> is
>> > not moving froward after that point
>> >
>> > 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
>> > Starting helix controller
>> > 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
>> > Settings loaded from
>> >
>> >
>> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
>> > 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>> > Connection to helix cluster : AiravataDemoCluster with name :
>> > helixcontroller2
>> > 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>> > Zookeeper connection string localhost:2181
>> > 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
>> > o.a.h.c.GenericHelixController  - Exception while executing
>> > DEFAULTpipeline:
>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
>> > cluster .AiravataDemoCluster. Will not continue to next pipeline
>> > org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
>> get
>> > full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
>> > at
>> >
>> >
>> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
>> > at
>> >
>> >
>> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
>> > at
>> org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
>> > at
>> >
>> >
>> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
>> > at
>> >
>> >
>> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
>> > Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
>> > Fail to read nodes for
>> > [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
>> > at
>> >
>> >
>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
>> > ... 7 common frames omitted
>> >
>> > In the zookeeper log I can see following warning getting printed
>> > continuously. What could be the reason for that? I'm using helix 0.8.2
>> and
>> > zookeeper 3.4.8
>> >
>> > 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
>> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
>> > client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f0000
>> > 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
>> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
>> > connection
>> > from /127.0.0.1:57984
>> > 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
>> > session 0x16b0e59877f0000 at /127.0.0.1:57984
>> > 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>> > 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
>> > 0x16b0e59877f0000 with negotiated timeout 30000 for client /
>> > 127.0.0.1:57984
>> > 2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
>> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream
>> exception
>> > EndOfStreamException: Unable to read additional data from client
>> sessionid
>> > 0x16b0e59877f0000, likely client has closed socket
>> > at
>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
>> > at
>> >
>> >
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
>> > at java.lang.Thread.run(Thread.java:748)
>> >
>> > Thanks
>> > Dimuthu
>> >
>>
>

Reply via email to