Now I'm seeing following error in controller log. Restarting the controller
fixed the issue. We are time to time seeing this in controller with zk
connection issues. Is this also something to do with zk client version?

2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN
 o.apache.zookeeper.ClientCnxn  - Session 0x16b0ebbee1d000e for server
localhost/127.0.0.1:2181, unexpected error, closing socket connection and
attempting reconnect
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:102)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:291)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1041)

Thanks
Dimuthu

On Fri, May 31, 2019 at 1:14 PM DImuthu Upeksha <dimuthu.upeks...@gmail.com>
wrote:

> Hi Lei,
>
> We use 0.8.2. We initially had 0.8.4 but it contains an issue with task
> retry logic so we downgraded to 0.8.2. We are planning to go into
> production with 0.8.2 by next week so can you please advice a better way to
> solve this without upgrading to 0.8.4.
>
> Thanks
> Dimuthu
>
> On Fri, May 31, 2019 at 1:04 PM Lei Xia <xiax...@gmail.com> wrote:
>
>> Which Helix version do you use?  This may caused by this Zookeeper bug (
>> https://issues.apache.org/jira/browse/ZOOKEEPER-706).  We have upgraded
>> ZkClient in later Helix versions.
>>
>>
>> Lei
>>
>> On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha <
>> dimuthu.upeks...@gmail.com> wrote:
>>
>>> Hi Folks,
>>>
>>> I'm getting following error in controller log and seems like controller
>>> is
>>> not moving froward after that point
>>>
>>> 2019-05-31 10:47:37,084 [main] INFO  o.a.a.h.i.c.HelixController  -
>>> Starting helix controller
>>> 2019-05-31 10:47:37,089 [main] INFO  o.a.a.c.u.ApplicationSettings  -
>>> Settings loaded from
>>>
>>> file:/home/airavata/staging-deployment/airavata-helix/apache-airavata-controller-0.18-SNAPSHOT/conf/airavata-server.properties
>>> 2019-05-31 10:47:37,091 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>>> Connection to helix cluster : AiravataDemoCluster with name :
>>> helixcontroller2
>>> 2019-05-31 10:47:37,092 [Thread-0] INFO  o.a.a.h.i.c.HelixController  -
>>> Zookeeper connection string localhost:2181
>>> 2019-05-31 10:47:42,907 [GenericHelixController-event_process] ERROR
>>> o.a.h.c.GenericHelixController  - Exception while executing
>>> DEFAULTpipeline:
>>> org.apache.helix.controller.pipeline.Pipeline@408d6d26for
>>> cluster .AiravataDemoCluster. Will not continue to next pipeline
>>> org.apache.helix.api.exceptions.HelixMetaDataAccessException: Failed to
>>> get
>>> full list of /AiravataDemoCluster/CONFIGS/PARTICIPANT
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:446)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValues(ZKHelixDataAccessor.java:406)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildValuesMap(ZKHelixDataAccessor.java:467)
>>> at
>>>
>>> org.apache.helix.controller.stages.ClusterDataCache.refresh(ClusterDataCache.java:176)
>>> at
>>>
>>> org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:62)
>>> at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:63)
>>> at
>>>
>>> org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:432)
>>> at
>>>
>>> org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:928)
>>> Caused by: org.apache.helix.api.exceptions.HelixMetaDataAccessException:
>>> Fail to read nodes for
>>> [/AiravataDemoCluster/CONFIGS/PARTICIPANT/helixparticipant]
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:414)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:479)
>>> at
>>>
>>> org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildren(ZkBaseDataAccessor.java:442)
>>> ... 7 common frames omitted
>>>
>>> In the zookeeper log I can see following warning getting printed
>>> continuously. What could be the reason for that? I'm using helix 0.8.2
>>> and
>>> zookeeper 3.4.8
>>>
>>> 2019-05-31 10:49:37,621 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for
>>> client /0:0:0:0:0:0:0:1:59056 which had sessionid 0x16b0e59877f0000
>>> 2019-05-31 10:49:37,773 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket
>>> connection
>>> from /127.0.0.1:57984
>>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@893] - Client attempting to renew
>>> session 0x16b0e59877f0000 at /127.0.0.1:57984
>>> 2019-05-31 10:49:37,774 [myid:] - INFO  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@645] - Established session
>>> 0x16b0e59877f0000 with negotiated timeout 30000 for client /
>>> 127.0.0.1:57984
>>> 2019-05-31 10:49:37,790 [myid:] - WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
>>> EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> 0x16b0e59877f0000, likely client has closed socket
>>> at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:230)
>>> at
>>>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>> Thanks
>>> Dimuthu
>>>
>>
>>
>> --
>> Lei Xia
>>
>

Reply via email to