[jira] [Commented] (MAPREDUCE-7227) Fix job staging directory residual problem in a big yarn cluster composed of multiple independent hdfs clusters

Yuan LUO (Jira) Fri, 23 Aug 2019 07:54:13 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914318#comment-16914318
 ]


Yuan LUO commented on MAPREDUCE-7227:
-------------------------------------

In order to explain this problem more clearly, I did the following test. 

1.First step: set up test environment.

I Set up two hdfs cluster, one named 'test-hdfs',another one named 'alg-hdfs', 
the test-hdfs also runs on yarn.
For the convenience of testing, I start just one nodemanager named node1 on 
test-hdfs and config the hdfs-site.xml nameserveics 'test-hdfs,alg-hdfs' like 
bellow,
so the task on node1 can both visit the data from test-hdfs and alg-hdfs. Also 
I set node1 'default.FS=test-hdfs' on core-site.xml. 

<property>
    <name>dfs.nameservices</name>
    <value>test-hdfs,alg-hdfs</value>
</property>
...

<property>
    <name>dfs.ha.namenodes.test-hdfs</name>
    <value>namenodexx,namenodeyy</value>
</property>

<property>
    <name>dfs.ha.namenodes.alg-hdfs</name>
    <value>namenodexx,namenodeyy</value>
</property>
  
....

Then I choose anther node named node2 on test-hdfs for my hadoop client(note: 
there is no running nodemanager on this node),
it also config the hdfs-site.xml nameserveics 'test-hdfs,alg-hdfs' , but set 
'default.FS=alg-hdfs' on core-site.xml. 


2.Second Step: running test example.

One node2, I run this command: hadoop jar 
hadoop-mapreduce-examples-2.6.0-cdh5.14.4.jar wordcount 
hdfs://test-hdfs/test/input/datas hdfs://test-hdfs/test/output/myout

(1)when the job is running,I list the hdfs dir ,show like below:

hdfs dfs -ls 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011

Found 3 items
-rw-r--r--   3 hdfs supergroup        300 2019-08-23 22:18 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011.summary
-rw-r--r--   3 hdfs supergroup      51080 2019-08-23 22:18 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1.jhist
-rw-r--r--   3 hdfs supergroup     156359 2019-08-23 22:18 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1_conf.xml


hdfs dfs -ls 
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011

Found 4 items
-rw-r--r--  10 hdfs supergroup     276388 2019-08-23 22:18 
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.jar
-rw-r--r--  10 hdfs supergroup        109 2019-08-23 22:18 
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.split
-rw-r--r--   3 hdfs supergroup         13 2019-08-23 22:18 
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.splitmetainfo
-rw-r--r--   3 hdfs supergroup     134404 2019-08-23 22:18 
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job.xml


(2)when the job finished, I list the hdfs dir again ,show like below:

hdfs dfs -ls 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011

Found 3 items
-rw-r--r--   3 hdfs supergroup        300 2019-08-23 22:18 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011.summary
-rw-r--r--   3 hdfs supergroup      51080 2019-08-23 22:18 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1.jhist
-rw-r--r--   3 hdfs supergroup     156359 2019-08-23 22:18 
hdfs://test-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011/job_1566565404948_0011_1_conf.xml

hdfs dfs -ls 
hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011

ls: 
`hdfs://alg-hdfs/tmp/hadoop-yarn/staging/hdfs/.staging/job_1566565404948_0011': 
No such file or directory


So job staging directory residual problem occurs. This problem occurs because 
client and Appmaster use differnt default.FS to create job staging dir.

> Fix job staging directory residual problem in a big yarn cluster composed of 
> multiple independent hdfs clusters
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7227
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7227
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 2.6.0, 2.7.0, 3.1.2
>            Reporter: Yuan LUO
>            Assignee: Yuan LUO
>            Priority: Major
>         Attachments: HADOOP-MAPREDUCE-7227.001.patch, 
> HADOOP-MAPREDUCE-7227.002.patch, HADOOP-MAPREDUCE-7227.003.patch, 
> HADOOP-MAPREDUCE-7227.004.patch
>
>
> Our yarn cluster is made up of some independent hdfs cluster, the 
> 'default.FS' in every hdfs cluster is different.when user submit job to yarn 
> cluster, if the 'default.FS'  between client and nodemanager  is 
> inconsistent, then the job staging dir can't be cleanup by AppMaster. Because 
> it will produce two job staging dirs in our conditions by client and 
> appmaster. So we can modify AppMaster  through  client's ‘default.FS’ to 
> create job staging dir.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7227) Fix job staging directory residual problem in a big yarn cluster composed of multiple independent hdfs clusters

Reply via email to