Jake Maes created SAMZA-922:
-------------------------------

             Summary: Host Affinity - Bug in SamzaContainerRequest causes 
(recoverable) exceptions in YARN
                 Key: SAMZA-922
                 URL: https://issues.apache.org/jira/browse/SAMZA-922
             Project: Samza
          Issue Type: Bug
            Reporter: Jake Maes
            Assignee: Jake Maes
             Fix For: 0.10.1


The constructor for SamzaContainerRequest creates the Yarn container request 
differently depending on whether there is a preferred host or not. 
Unfortunately, it looks for preferredHost == null but not 
preferredHost.equals(ANY_HOST) and ANY_HOST is the string passed when there is 
no preferred host. 

As a result, the Yarn container request is actually asking for a container on 
the host name "ANY_HOST" which causes the following exception:

2016-03-29 21:25:53.892 [main] ScriptBasedMapping [WARN] Exception running 
/OMITTED/sbin/yarn-topology.py ANY_HOST 
java.io.IOException: Cannot run program 
"/OMITTED/application_1452292535523_0047/container_1452292535523_0047_02_000001"):
 error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1042)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
        at org.apache.hadoop.util.Shell.run(Shell.java:455)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
        at 
org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.runResolveCommand(ScriptBasedMapping.java:251)
        at 
org.apache.hadoop.net.ScriptBasedMapping$RawScriptBasedMapping.resolve(ScriptBasedMapping.java:188)
        at 
org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:119)
        at 
org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:101)
        at 
org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.resolveRacks(AMRMClientImpl.java:551)
        at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.addContainerRequest(AMRMClientImpl.java:411)
        at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.addContainerRequest(AMRMClientAsyncImpl.java:166)
        at 
org.apache.samza.job.yarn.ContainerRequestState.updateRequestState(ContainerRequestState.java:82)
        at 
org.apache.samza.job.yarn.AbstractContainerAllocator.requestContainer(AbstractContainerAllocator.java:102)
        at 
org.apache.samza.job.yarn.AbstractContainerAllocator.requestContainers(AbstractContainerAllocator.java:85)
        at 
org.apache.samza.job.yarn.SamzaTaskManager.onInit(SamzaTaskManager.java:112)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:117)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$run$1.apply(SamzaAppMaster.scala:117)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$.run(SamzaAppMaster.scala:117)
        at 
org.apache.samza.job.yarn.SamzaAppMaster$.main(SamzaAppMaster.scala:104)
        at org.apache.samza.job.yarn.SamzaAppMaster.main(SamzaAppMaster.scala)
Caused by: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.forkAndExec(Native Method)
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:187)
        at java.lang.ProcessImpl.start(ProcessImpl.java:134)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1023)

The exception is recoverable when relaxed locality = true because Yarn just 
defaults to a random host on the default rack, which was the desired result of 
the ANY_HOST request. However the behavior is incorrect and the stack traces 
tend to fill the log.

The string "ANY_HOST" is internal to Samza and Yarn should never see it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to