[ 
https://issues.apache.org/jira/browse/SAMZA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanil Jain resolved SAMZA-2475.
-------------------------------
    Resolution: Fixed

> Add a allocated resource expiry timeout in samza yarn type of apps
> ------------------------------------------------------------------
>
>                 Key: SAMZA-2475
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2475
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Sanil Jain
>            Assignee: Sanil Jain
>            Priority: Major
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Today if samza apps are not able to use an allocated resource within
>  yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
>  start of the container fails with this exception, this can be avoided by 
> just setting a allocated resource timeout less than that config
>  
> {code:java}
> // 2020-02-21 00:45:28.033 
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] 
> YarnClusterResourceManager [INFO] Got start error notification for Container 
> ID: container_e05_1563223715359_0384_01_000833 for Processor ID: 34-0 
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
> start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time 
> zones.
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
>       at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>       at 
> org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> 2020-02-21 00:45:28.034 
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #61] 
> YarnClusterResourceManager [INFO] Got stop notification for Container ID: 
> container_e05_1563223715359_0384_01_000183 for Processor ID: 35-0
> 2020-02-21 00:45:28.034 
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] 
> ContainerProcessManager [INFO] Container ID: 
> container_e05_1563223715359_0384_01_000833 matched pending Processor ID: 34-0 
> on host: lva1-app1115.corp.linkedin.com
> 2020-02-21 00:45:28.034 
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48] 
> ContainerProcessManager [ERROR] Launch failed for pending Processor ID: 34-0 
> on Container ID: container_e05_1563223715359_0384_01_000833 on host: 
> lva1-app1115.corp.linkedin.com with exception: {}
> org.apache.samza.clustermanager.ProcessorLaunchException: 
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
> start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time 
> zones.
>       at 
> org.apache.samza.job.yarn.YarnClusterResourceManager.onStartContainerError(YarnClusterResourceManager.java:552)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.onExceptionRaised(NMClientAsyncImpl.java:401)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:390)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized 
> request to start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time 
> zones.
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
>       at 
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
>       at 
> org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
>       at 
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378)
>       ... 10 more
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to