mapredsystem directory not cleaned up after deallocation

Hemanth Yamijala (JIRA) Wed, 27 Feb 2008 04:00:56 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572885#action_12572885
 ]


yhemanth edited comment on HADOOP-2899 at 2/27/08 3:59 AM:
-------------------------------------------------------------------

Hodring does not create the dfs directory, though it supplies the name to the 
jobtracker. The name is /mapredsystem/hostname. JobTracker creates this at 
startup time, but doesn't delete it when killed. However, when another 
jobtracker starts on the same hostname, it seems to be trying to clean it up in 
some way, and the current permissions set by JobTracker code prevent a read 
from happening, which is why there's a failure.

Possible solutions:

1. The directory is world-readable. In HADOOP-1873, Doug commented that this 
should not be world-readable, so I guess this is not an option, for some reason.
2. JobTracker can delete the directory when it is shutdown. This seems, IMO, 
the best solution. Whoever creates, deletes.
3. HOD can create a directory that's specific to the user, something like 
mapredsystem/user-name.clusterid, where clusterid could be like a torque jobid. 
This may create a huge number of directories in HDFS, don't know if that's an 
issue. Otherwise, this is probably the *easiest* solution to implement
4. HOD can delete the mapredsystem directory at deallocation. This seems wrong, 
because it doesn't create it, and further, given our current design, this is 
very hard to implement.
5. JobTracker can avoid cleaning up the directory at startup. This may not be 
safe, in case there's a crash or something.

Comments ? 

      was (Author: yhemanth):
    Possible solutions:

1. The directory is world-readable. In HADOOP-1873, Doug commented that this 
should not be world-readable, so I guess this is not an option, for some reason.
2. JobTracker can delete the directory when it is shutdown. This seems, IMO, 
the best solution. Whoever creates, deletes.
3. HOD can create a directory that's specific to the user, something like 
mapredsystem/user-name.clusterid, where clusterid could be like a torque jobid. 
This may create a huge number of directories in HDFS, don't know if that's an 
issue. Otherwise, this is probably the *easiest* solution to implement
4. HOD can delete the mapredsystem directory at deallocation. This seems wrong, 
because it doesn't create it, and further, given our current design, this is 
very hard to implement.
5. JobTracker can avoid cleaning up the directory at startup. This may not be 
safe, in case there's a crash or something.

Comments ? 
  
> hdfs:///mapredsystem directory not cleaned up after deallocation 
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2899
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2899
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>    Affects Versions: 0.16.0
>            Reporter: Luca Telloli
>
> Each submitted job creates a hdfs:///mapredsystem directory, created by (I 
> guess) the hodring process. Problem is that it's not cleaned up at the end of 
> the process; a use case would be:
> - user A allocates a cluster, the hodring is svrX, so a /mapredsystem/srvX 
> directory is created
> - user A deallocates the cluster, but that directory is not cleaned up
> - user B allocates a cluster, and the first node chosen as hodring is svrX, 
> so hodring tries to write hdfs:///mapredsystem but it fails
> - allocation succeeds, but there's no hodring running; looking at
> 0-jobtracker/logdir/hadoop.log under the temporary directory I can read:
> 2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error 
> starting tracker: org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
> user=B, access=WRITE, inode="mapredsystem":hadoop:supergroup:rwxr-xr-x
> I guess a possible solution would be to clean up those directories during the 
> deallocation process. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2899) hdfs:///mapredsystem directory not cleaned up after deallocation

Reply via email to