Hello,

I build Spark on HDFS/YARN cluster on Docker Containers.

* Spark on YARN version
  - Spark 1.6.0
  - Hadoop 2.6.0 (CDH 5.6.0)
  - Oracle Java 1.8.0_74

There are one HDFS/YARN master and one HDFS/YARN worker on each containers.

spark-yarn-master container has below hostname and IP addr.

hostname: spark-yarn-master-1-sxegt (pod name)
IP addr.: 172.17.0.11

hostname: spark-yarn-master (alias DNS name)
IP addr.: 172.30.242.57 (alias IP addr.)

bash-4.2$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.17.0.11     spark-yarn-master-1-sxegt
bash-4.2$

bash-4.2$ ip -4 addr show dev eth0
50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP  link-netnsid 0
    inet 172.17.0.11/16 scope global eth0
       valid_lft forever preferred_lft forever
bash-4.2$

bash-4.2$ hostname -f
spark-yarn-master-1-sxegt
bash-4.2$ 

bash-4.2$ curl -v spark-yarn-master:8020
* About to connect() to spark-yarn-master port 8020 (#0)
*   Trying 172.30.242.57...
* Connected to spark-yarn-master (172.30.242.57) port 8020 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: spark-yarn-master:8020
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Content-type: text/plain
* no chunk, no close, no size. Assume close to signal end
< 
It looks like you are making an HTTP request to a Hadoop IPC port. This is not 
the correct port for the web interface on this daemon.
* Closing connection 0
bash-4.2$ 

spark-yarn-worker container has below hostname and IP addr.

hostname: spark-yarn-worker-1-pshqi (pod name)
IP addr.: 172.17.0.12

hostname: spark-yarn-worker (alias DNS name)
IP addr.: 172.30.1.53 (alias IP addr.)

bash-4.2$ cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.17.0.12     spark-yarn-worker-1-pshqi
bash-4.2$

bash-4.2$ ip -4 addr show dev eth0
52: eth0@if53: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP  link-netnsid 0
    inet 172.17.0.12/16 scope global eth0
       valid_lft forever preferred_lft forever
bash-4.2$ 

bash-4.2$ hostname -f
spark-yarn-worker-1-pshqi
bash-4.2$ 

bash-4.2$ curl -v spark-yarn-worker:8040
* About to connect() to spark-yarn-worker port 8040 (#0)
*   Trying 172.30.1.53...
* Connected to spark-yarn-worker (172.30.1.53) port 8040 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: spark-yarn-worker:8040
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Content-type: text/plain
* no chunk, no close, no size. Assume close to signal end
< 
It looks like you are making an HTTP request to a Hadoop IPC port. This is not 
the correct port for the web interface on this daemon.
* Closing connection 0
bash-4.2$ 

Spark HDFS/YARN master/worker nodes can connect each other by alias DNS name.

On master, To worker (alias DNS name):

bash-4.2$ hostname -f ; curl spark-yarn-worker:8040
spark-yarn-master-1-sxegt
It looks like you are making an HTTP request to a Hadoop IPC port. This is not 
the correct port for the web interface on this daemon.
bash-4.2$ 

On worker, To master (alias DNS name):

bash-4.2$ hostname -f ; curl spark-yarn-master:8020
spark-yarn-worker-1-pshqi
It looks like you are making an HTTP request to a Hadoop IPC port. This is not 
the correct port for the web interface on this daemon.
bash-4.2$ 

They cannot connect each other by hostname.

On master, To worker (hostname):

bash-4.2$ hostname -f ; curl spark-yarn-master-1-sxegt:8020
spark-yarn-worker-1-pshqi
curl: (6) Could not resolve host: spark-yarn-master-1-sxegt; Name or service 
not known
bash-4.2$ 

On worker, To master (hostname):

bash-4.2$ hostname -f ; curl spark-yarn-worker-1-pshqi:8040                    
spark-yarn-master-1-sxegt
curl: (6) Could not resolve host: spark-yarn-worker-1-pshqi; Name or service 
not known
bash-4.2$ 

So, I want HDFS/YARN to use alias DNS name instead of hostname.
But YARN nodemanager always uses hostname
even configuredyarn.nodemanager.hostname to alias DNS name.


HDFS/YARN worker log:

16/03/07 10:04:38 INFO datanode.DataNode: Configured hostname is 
spark-yarn-worker
        :
16/03/07 10:04:42 INFO security.NMContainerTokenSecretManager: Updating node 
address : spark-yarn-worker-1-pshqi:39352
        :
16/03/07 10:04:42 INFO containermanager.ContainerManagerImpl: ContainerManager 
started at spark-yarn-worker-1-pshqi/172.17.0.12:39352
        :
16/03/07 10:04:44 INFO nodemanager.NodeStatusUpdaterImpl: Registered with 
ResourceManager as spark-yarn-worker-1-pshqi:39352 with total resource of 
<memory:8192, vCores:8>
        :
16/03/07 10:04:44 INFO common.Storage: Lock on 
/var/lib/hadoop-hdfs/cache/hdfs/dfs/data/in_use.lock acquired by nodename 
16@spark-yarn-worker-1-pshqi
        :


HDFS/YARN master log:

16/03/07 10:04:44 INFO resourcemanager.ResourceTrackerService: NodeManager from 
node spark-yarn-worker-1-pshqi(cmPort: 39352 httpPort: 8042) registered with 
capability: <memory:8192, vCores:8>, assigned nodeId 
spark-yarn-worker-1-pshqi:39352
        :
16/03/07 10:04:45 INFO rmnode.RMNodeImpl: spark-yarn-worker-1-pshqi:39352 Node 
Transitioned from NEW to RUNNING
16/03/07 10:04:45 INFO fair.FairScheduler: Added node 
spark-yarn-worker-1-pshqi:39352 cluster capacity: <memory:8192, vCores:8>


When submit application to spark master, spark master connects to worker,
but it uses "unresolved" worker's hostname not alias DNS name,
it throws java.net.UnknownHostException.


16/03/07 11:58:51 INFO scheduler.SchedulerNode: Assigned container 
container_1457312681433_0001_01_000001 of capacity <memory:1024, vCores:1> on 
host spark-yarn-worker-1-pshqi:39352, which has 1 containers, <memory:1024, 
vCores:1> used and <memory:7168, vCores:7> available after allocation
16/03/07 11:58:51 ERROR scheduler.SchedulerApplicationAttempt: Error trying to 
assign container token and NM token to an allocated container 
container_1457312681433_0001_01_000001
java.lang.IllegalArgumentException: java.net.UnknownHostException: 
spark-yarn-worker-1-pshqi


Can I configure YARN nodemanager to use arbitrary hostname?
Sorry to say, there is no chance to modify container's hostname, /etc/hosts and 
DNS.

Regards,
        dai
-- 
HIGUCHI Daisuke <d-higu...@creationline.com>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to