[ 
https://issues.apache.org/jira/browse/HELIX-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663753#comment-13663753
 ] 

kishore gopalakrishna commented on HELIX-96:
--------------------------------------------

Do you have the full stack trace, is it easy to reproduce this ?  I want to 
understand which helix api are you invoking from your code and if its in the 
main thread of a separate thread.

   java.lang.Thread.State: TIMED_WAITING (parking) 
at sun.misc.Unsafe.park(Native Method) 
- parking to wait for <0x187c1f10> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:237) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2072)
 
at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636) 
at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619) 
at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615) 
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679) 
at org.apache.helix.manager.zk.ZkClient.readData(ZkClient.java:254) 
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761) 
at 
org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:315) 
at 
org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.get(ZkCacheBaseDataAccessor.java:461)
 


I can provide a patch that will throw exception but i dont know how you plan to 
handle the exception. 


                
> ZkBaseDataAccessor.get() hangs during Zookeeper failure
> -------------------------------------------------------
>
>                 Key: HELIX-96
>                 URL: https://issues.apache.org/jira/browse/HELIX-96
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.6.0-incubating
>            Reporter: Ming Fang
>            Assignee: Shi Lu
>
> During our failure testing with Zookeeper running in standalone mode, we 
> sometimes see our application hanging in the callstack below...
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x187c1f10> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at 
> java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:237)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2072)
>       at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:636)
>       at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:619)
>       at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:615)
>       at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:679)
>       at org.apache.helix.manager.zk.ZkClient.readData(ZkClient.java:254)
>       at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>       at 
> org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:315)
>       at 
> org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.get(ZkCacheBaseDataAccessor.java:461)
> The comment in ZKClient.java line 677 seems to say that eventually it would 
> get a Disconnected event and then throw an exception, but we waited for many 
> minutes.
> Also we were able to resume by simply restarting Zookeeper.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to