[ 
https://issues.apache.org/jira/browse/SAMZA-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1568:
-------------------------------------------
    Description: 
When zookeeper session failures occur in a stream processor,   leaves the 
group(zkClient is closed) and joins the group again.

The last step in that shutdown sequence is zkClient.close(). In some scenarios, 
it throws the following exception, 
{code:java}
    org.I0Itec.zkclient.exception.ZkInterruptedException: 
java.lang.InterruptedException

    at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278)

    at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92)

    at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
{code}
In existing implementation this is not handled, there by killing the stream 
processor.  The following codepath triggers this exception:
{code:java}
StreamProcessor.stop -> ZkJobCoordinator.stop() ->  zkController.stop() -> 
zkUtils.close
{code}
This exception causes the integration test to fail occasionally  and can cause 
LocalApplicationRunner.waitForFinish method call to be infinite (since this 
callback event success, updates the latch state required for waitForFinish to 
end).

  was:
When zookeeper session failures occur in a stream processor,   leaves the 
group(zkClient is closed) and joins the group again.

The last step in that shutdown sequence is zkClient.close(). In some scenarios, 
it throws the following exception, 
{code:java}
    org.I0Itec.zkclient.exception.ZkInterruptedException: 
java.lang.InterruptedException

    at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278)

    at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92)

    at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
{code}

In existing implementation this is not handled, there by killing the stream 
processor.  The following codepath triggers this exception:
{code:java}
StreamProcessor.stop -> ZkJobCoordinator.stop() ->  zkController.stop() -> 
zkUtils.close
{code}

This exception causes the integration test to fail occasionally  and can cause 
LocalApplicationRunner.waitForFinish method call to be infinite (since this 
callback event success, updates the latch state required for the call to 
finish).


> Handle ZkInterruptedException in zkclient.close.
> ------------------------------------------------
>
>                 Key: SAMZA-1568
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1568
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> When zookeeper session failures occur in a stream processor,   leaves the 
> group(zkClient is closed) and joins the group again.
> The last step in that shutdown sequence is zkClient.close(). In some 
> scenarios, it throws the following exception, 
> {code:java}
>     org.I0Itec.zkclient.exception.ZkInterruptedException: 
> java.lang.InterruptedException
>     at org.I0Itec.zkclient.ZkClient.close(ZkClient.java:1278)
>     at org.apache.samza.zk.ZkControllerImpl.stop(ZkControllerImpl.java:92)
>     at org.apache.samza.zk.ZkJobCoordinator.stop(ZkJobCoordinator.java:141)
> {code}
> In existing implementation this is not handled, there by killing the stream 
> processor.  The following codepath triggers this exception:
> {code:java}
> StreamProcessor.stop -> ZkJobCoordinator.stop() ->  zkController.stop() -> 
> zkUtils.close
> {code}
> This exception causes the integration test to fail occasionally  and can 
> cause LocalApplicationRunner.waitForFinish method call to be infinite (since 
> this callback event success, updates the latch state required for 
> waitForFinish to end).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to