[ 
https://issues.apache.org/jira/browse/SAMZA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shanthoosh Venkataraman updated SAMZA-1607:
-------------------------------------------
    Description: 
Existing implementation of reading the data of ephemeral processor nodes in 
zookeeper happens in two steps.

   A. Fetch the list of ephemeral processor nodes.

   B. Read the data of each processor node from the list. 

A ephemeral zookeeper node present in step A might be unavailable in the step 
B. This exception in unhandled currently and can kill the leader processor 
unnecessarily. Here's the related exception observed in a dev setup.
{code:java}
org.apache.samza.SamzaException: Cannot read ZK node: 
/app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001

at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232)
at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255)
at 
org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292)
at 
org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194)
at 
org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188)
at 
org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084)
at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226)
{code}

  was:
Existing implementation of reading the data of ephemeral processor nodes in 
zookeeper happens in two steps.

   A. Fetch the list of ephemeral processor nodes.

   B. Read the data of each processor node from the list. 

Some zookeeper nodes present in step A might be unavailable in the step B. This 
exception in unhandled currently and can kill the leader processor 
unnecessarily. Here's the related exception observed in a dev setup.
{code:java}
org.apache.samza.SamzaException: Cannot read ZK node: 
/app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001

at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232)
at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255)
at 
org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292)
at 
org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194)
at 
org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188)
at 
org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095)
at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084)
at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226)
{code}


> Fix bug in reading the ephemeral processor nodes from zookeeper.
> ----------------------------------------------------------------
>
>                 Key: SAMZA-1607
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1607
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> Existing implementation of reading the data of ephemeral processor nodes in 
> zookeeper happens in two steps.
>    A. Fetch the list of ephemeral processor nodes.
>    B. Read the data of each processor node from the list. 
> A ephemeral zookeeper node present in step A might be unavailable in the step 
> B. This exception in unhandled currently and can kill the leader processor 
> unnecessarily. Here's the related exception observed in a dev setup.
> {code:java}
> org.apache.samza.SamzaException: Cannot read ZK node: 
> /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
> at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:232)
> at org.apache.samza.zk.ZkUtils.getActiveProcessorsIDs(ZkUtils.java:255)
> at 
> org.apache.samza.zk.ZkJobCoordinator.getActualProcessorIds(ZkJobCoordinator.java:292)
> at 
> org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:194)
> at 
> org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:188)
> at 
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:134)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /app-test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0/test-app-name-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-test-app-id-9fba7675-36e3-4a6e-8934-4cad6a8ebab0-coordinationData/processors/0000000001
> at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1001)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1100)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1095)
> at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:1084)
> at org.apache.samza.zk.ZkUtils.readProcessorData(ZkUtils.java:226)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to