[
https://issues.apache.org/jira/browse/MAPREDUCE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914318#comment-16914318
]
Yuan LUO commented on MAPREDUCE-7227:
-------------------------------------
In order to explain this problem more clearly, I did the following test.
1.First step: set up test environment.
I Set up two hdfs cluster, one named 'test-hdfs',another one named 'alg-hdfs',
the test-hdfs also runs on yarn.
For the convenience of testing, I start just one nodemanager named node1 on
test-hdfs and config the hdfs-site.xml nameserveics 'test-hdfs,alg-hdfs' like
bellow,
so the task on node1 can both visit the data from test-hdfs and alg-hdfs. Also
I set node1 'default.FS=test-hdfs' on core-site.xml.
<property>
<name>dfs.nameservices</name>
<value>test-hdfs,alg-hdfs</value>
</property>
...
<property>
<name>dfs.ha.namenodes.test-hdfs</name>
<value>namenodexx,namenodeyy</value>
</property>
<property>
<name>dfs.ha.namenodes.alg-hdfs</name>
<value>namenodexx,namenodeyy</value>
</property>
....
Then I choose anther node named node2 on test-hdfs for my hadoop client(note:
there is no running nodemanager on this node),
it also config the hdfs-site.xml nameserveics 'test-hdfs,alg-hdfs' , but set
'default.FS=alg-hdfs' on core-site.xml.
2.Second Step: running test example.
One node2, I run this command: hadoop jar
hadoop-mapreduce-examples-2.6.0-cdh5.14.4.jar wordcount
hdfs://test-hdfs/test/input/datas hdfs://test-hdfs/test/output/myout
(1)when the job is running,I list the hdfs dir ,show like below:
hdfs dfs -ls
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011
Found 3 items
-rw-r--r-- 3 hdfs supergroup 300 2019-08-23 22:18
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011.summary
-rw-r--r-- 3 hdfs supergroup 51080 2019-08-23 22:18
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1.jhist
-rw-r--r-- 3 hdfs supergroup 156359 2019-08-23 22:18
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1_conf.xml
hdfs dfs -ls
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011
Found 4 items
-rw-r--r-- 10 hdfs supergroup 276388 2019-08-23 22:18
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.jar
-rw-r--r-- 10 hdfs supergroup 109 2019-08-23 22:18
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.split
-rw-r--r-- 3 hdfs supergroup 13 2019-08-23 22:18
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.splitmetainfo
-rw-r--r-- 3 hdfs supergroup 134404 2019-08-23 22:18
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.xml
(2)when the job finished, I list the hdfs dir again ,show like below:
hdfs dfs -ls
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011
Found 3 items
-rw-r--r-- 3 hdfs supergroup 300 2019-08-23 22:18
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011.summary
-rw-r--r-- 3 hdfs supergroup 51080 2019-08-23 22:18
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1.jhist
-rw-r--r-- 3 hdfs supergroup 156359 2019-08-23 22:18
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1_conf.xml
hdfs dfs -ls
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011
ls:
`hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011':
No such file or directory
So job staging directory residual problem occurs. This problem occurs because
client and Appmaster use differnt default.FS to create job staging dir.
> Fix job staging directory residual problem in a big yarn cluster composed of
> multiple independent hdfs clusters
> ---------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-7227
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7227
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: applicationmaster, mrv2
> Affects Versions: 2.6.0, 2.7.0, 3.1.2
> Reporter: Yuan LUO
> Assignee: Yuan LUO
> Priority: Major
> Attachments: HADOOP-MAPREDUCE-7227.001.patch,
> HADOOP-MAPREDUCE-7227.002.patch, HADOOP-MAPREDUCE-7227.003.patch,
> HADOOP-MAPREDUCE-7227.004.patch
>
>
> Our yarn cluster is made up of some independent hdfs cluster, the
> 'default.FS' in every hdfs cluster is different.when user submit job to yarn
> cluster, if the 'default.FS' between client and nodemanager is
> inconsistent, then the job staging dir can't be cleanup by AppMaster. Because
> it will produce two job staging dirs in our conditions by client and
> appmaster. So we can modify AppMaster through client's ‘default.FS’ to
> create job staging dir.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]