Jake Maes created SAMZA-922:
-------------------------------
Summary: Host Affinity - Bug in SamzaContainerRequest causes
(recoverable) exceptions in YARN
Key: SAMZA-922
URL: https://issues.apache.org/jira/browse/SAMZA-922
Project: Samza
Issue Type: Bug
Reporter: Jake Maes
Assignee: Jake Maes
Fix For: 0.10.1
The constructor for SamzaContainerRequest creates the Yarn container request
differently depending on whether there is a preferred host or not.
Unfortunately, it looks for preferredHost == null but not
preferredHost.equals(ANY_HOST) and ANY_HOST is the string passed when there is
no preferred host.
As a result, the Yarn container request is actually asking for a container on
the host name "ANY_HOST" which causes the following exception:
2016-03-29 21:25:53.892 [main] ScriptBasedMapping [WARN] Exception running
/OMITTED/sbin/yarn-topology.py ANY_HOST
java.io.IOException: Cannot run program
"/OMITTED/application_1452292535523_0047/container_1452292535523_0047_02_000001"):
error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at
org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
at
org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
at
org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
at
org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
at
org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95)
at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.resolveRacks(AMRMClientImpl.java:551)
at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:411)
at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
at
org.apache.samza.job.yarn.ContainerRequestState.updateRequestState(ContainerRequestState.java:82)
at
org.apache.samza.job.yarn.AbstractContainerAllocator.requestContainer(AbstractContainerAllocator.java:102)
at
org.apache.samza.job.yarn.AbstractContainerAllocator.requestContainers(AbstractContainerAllocator.java:85)
at
org.apache.samza.job.yarn.SamzaTaskManager.onInit(SamzaTaskManager.java:112)
at
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:117)
at
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:117)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:117)
at
org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:104)
at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:187)
at java.lang.ProcessImpl.start(ProcessImpl.java:134)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1023)
The exception is recoverable when relaxed locality = true because Yarn just
defaults to a random host on the default rack, which was the desired result of
the ANY_HOST request. However the behavior is incorrect and the stack traces
tend to fill the log.
The string "ANY_HOST" is internal to Samza and Yarn should never see it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)