[jira] Commented: (MAPREDUCE-181) Secure job submission

Devaraj Das (JIRA) Tue, 08 Sep 2009 20:10:23 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752876#action_12752876
 ]


Devaraj Das commented on MAPREDUCE-181:
---------------------------------------

bq. Instead of storing the UGI with the submitted job, please store the user as 
a string. That will be forward compatible when we move to server-side groups. I 
think it makes sense to do as part of this patch, if it isn't already being 
done.

The jobconf already has the username. Are you saying that the JT should 
maintain the mapping from the jobID to the username who was given this jobID 
(step 1 in the jobsubmission protocol), so that in the the following RPC the JT 
would be able to efficiently look up the username based on the jobID, rather 
than having to parse the conf to get it?

bq. The meta information should only include the offset, since the length is 
redundant with the following split's start.

Hmm.. right.

bq. We use the binary format instead of xml to store the jobconf. However, when 
loading the binary format, we need to handle the final parameters.

The conf is serialized using Configuration's write(DataOutput) that actually 
serializes everything out as strings. The JobTracker then writes the read 
configuration in the mapred.system.dir using Configuration.writeXml. The 
JobInProgress constructor loads the conf in the normal way (in the way it 
happens today). So final parameters defined in the JobTracker will be taken 
care of in the usual way. 

bq. I'm not very happy with half of the job information being saved in the 
system directory and half of it in the staging directory. I assume that the 
staging directory is required to be on the same file system as the system 
directory? Having the job's definition split into two directories with two 
different owners seems bad. That is especially true since the data in the 
system directory will point to particular byte offsets in the staging 
directory. I think we will be in for some really nasty bugs involving

The way I am seeing it is that the JobTracker is given only that piece of 
information that's required to launch the job. Things like job.jar, the split 
bytes, the distributed cache files, and anything else the users want to use in 
the job, are things required by the tasks which the JT doesn't care about. 
Every piece of information is generated by the client. If the client had 
generated the wrong information about the byte offsets, only his job gets 
affected. 
Your sentence about the "nasty bugs" is incomplete..

bq. I assume the cleanup of the staging directory is done by the JobTracker.

Done as part of the job cleanup task.

bq. I guess I would be happier, if as part of JobSubmission, we moved the files 
from the user's staging area into the system dir. The JobTracker would read 
(possibly with a cache) the bytes for the task and send them to the user as 
part of the task definition.

The split bytes file has a high replication factor of 10 (and it could be 
something like what Doug suggested). So do we really want the JT to copy the 
bytes to the system dir. I am trying to weigh the options of letting the tasks 
read the split bytes from the split file directly versus the JT passing the 
same in the task definition. The former reduces load on the JT (it doesn't have 
to load the split bytes in memory at all). 

> Secure job submission 
> ----------------------
>
>                 Key: MAPREDUCE-181
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-181) Secure job submission

Reply via email to