[
https://issues.apache.org/jira/browse/SAMZA-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sanil Jain resolved SAMZA-2475.
-------------------------------
Resolution: Fixed
> Add a allocated resource expiry timeout in samza yarn type of apps
> ------------------------------------------------------------------
>
> Key: SAMZA-2475
> URL: https://issues.apache.org/jira/browse/SAMZA-2475
> Project: Samza
> Issue Type: New Feature
> Reporter: Sanil Jain
> Assignee: Sanil Jain
> Priority: Major
> Time Spent: 3h
> Remaining Estimate: 0h
>
> Today if samza apps are not able to use an allocated resource within
> yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
> start of the container fails with this exception, this can be avoided by
> just setting a allocated resource timeout less than that config
>
> {code:java}
> // 2020-02-21 00:45:28.033
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48]
> YarnClusterResourceManager [INFO] Got start error notification for Container
> ID: container_e05_1563223715359_0384_01_000833 for Processor ID: 34-0
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
> start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time
> zones.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> at
> org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020-02-21 00:45:28.034
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #61]
> YarnClusterResourceManager [INFO] Got stop notification for Container ID:
> container_e05_1563223715359_0384_01_000183 for Processor ID: 35-0
> 2020-02-21 00:45:28.034
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48]
> ContainerProcessManager [INFO] Container ID:
> container_e05_1563223715359_0384_01_000833 matched pending Processor ID: 34-0
> on host: lva1-app1115.corp.linkedin.com
> 2020-02-21 00:45:28.034
> [org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #48]
> ContainerProcessManager [ERROR] Launch failed for pending Processor ID: 34-0
> on Container ID: container_e05_1563223715359_0384_01_000833 on host:
> lva1-app1115.corp.linkedin.com with exception: {}
> org.apache.samza.clustermanager.ProcessorLaunchException:
> org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to
> start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time
> zones.
> at
> org.apache.samza.job.yarn.YarnClusterResourceManager.onStartContainerError(YarnClusterResourceManager.java:552)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.onExceptionRaised(NMClientAsyncImpl.java:401)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:390)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:363)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer.handle(NMClientAsyncImpl.java:498)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$ContainerEventProcessor.run(NMClientAsyncImpl.java:557)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized
> request to start container.
> This token is expired. current time is 1582245928028 found 1582245749262
> Note: System times on machines may be out of sync. Check system time and time
> zones.
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
> at
> org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:207)
> at
> org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl$StatefulContainer$StartContainerTransition.transition(NMClientAsyncImpl.java:378)
> ... 10 more
> {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)