Navina Ramesh created SAMZA-1523:
------------------------------------

             Summary: Cleanup table entries before shutting down the processor
                 Key: SAMZA-1523
                 URL: https://issues.apache.org/jira/browse/SAMZA-1523
             Project: Samza
          Issue Type: Bug
            Reporter: Navina Ramesh
            Assignee: Navina Ramesh


We want to remove expired entries of the processors from the Azure Table when 
the processor is shutting down. Azure Table service uses optimistic locking by 
default. Hence, when the coordinator thread is cleaning up during shutdown, it 
is possible for the heartbeat thread to update that entry as well. This causes 
a failure in cleanup and throws exceptions in the log. Obviously, it also fails 
to clear the entries :)

`
2017-11-30 15:23:32.804 [JMVersionUpgradeScheduler-0] AzureJobCoordinator 
[INFO] pid=05133d0a-dd85-4178-a97c-2c98dc617308new version 5 of the job model 
got confirmed
2017-11-30 15:23:32.833 [HeartbeatScheduler-0] HeartbeatScheduler [INFO] 
Updating heartbeat for processor ID: 05133d0a-dd85-4178-a97c-2c98dc617308 and 
job model version: 4
2017-11-30 15:23:32.905 [JMVersionUpgradeScheduler-0] TableUtils [ERROR] Azure 
storage exception while deleting processor entity with job model version: 4and 
pid: 05133d0a-dd85-4178-a97c-2c98dc617308
com.microsoft.azure.storage.table.TableServiceException: Precondition Failed
        at 
com.microsoft.azure.storage.table.TableServiceException.generateTableServiceException(TableServiceException.java:52)
        at 
com.microsoft.azure.storage.table.TableOperation$1.preProcessResponse(TableOperation.java:319)
        at 
com.microsoft.azure.storage.table.TableOperation$1.preProcessResponse(TableOperation.java:299)
        at 
com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:139)
        at 
com.microsoft.azure.storage.table.TableOperation.performDelete(TableOperation.java:281)
        at 
com.microsoft.azure.storage.table.TableOperation.execute(TableOperation.java:685)
        at 
com.microsoft.azure.storage.table.CloudTable.execute(CloudTable.java:529)
        at 
com.microsoft.azure.storage.table.CloudTable.execute(CloudTable.java:496)
        at 
org.apache.samza.util.TableUtils.deleteProcessorEntity(TableUtils.java:157)
                at 
org.apache.samza.coordinator.AzureJobCoordinator.onNewJobModelConfirmed(AzureJobCoordinator.java:448)
        at 
org.apache.samza.coordinator.AzureJobCoordinator.onNewJobModelAvailable(AzureJobCoordinator.java:419)
        at 
org.apache.samza.coordinator.AzureJobCoordinator.lambda$createJMVersionUpgradeListener$3(AzureJobCoordinator.java:248)
        at 
org.apache.samza.coordinator.scheduler.JMVersionUpgradeScheduler.lambda$scheduleTask$0(JMVersionUpgradeScheduler.java:81)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2017-11-30 15:23:32.906 [JMVersionUpgradeScheduler-0] AzureJobCoordinator 
[ERROR] Exception in Job Model Version Upgrade Scheduler. Stopping the 
processor...
`

We should disable optimistic locking during the cleanup phase of shutdown.  
Ideal solution is to perhaps have more control over the various schedulers. 
That is beyond the scope of this JIRA though :) 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to