[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751438#action_12751438
 ] 

Amar Kamat commented on MAPREDUCE-181:
--------------------------------------

Here is the final proposal :
# Here is how the handshake happens for job submission
 ## jobclient asks the jobtracker for a new jobid (jobtracker maintains a 
mapping from job-id to user-name [ugi]. This user is the owner of the job and 
will be allowed to submit the job)
 ## using the Input-split, the jobclient constructs a split _meta-info_ for the 
jobtracker to be able to create the task->node locality cache. 
  {code}
   job-split-meta-info :
       - split-location (location of the actual split/raw-bytes)
       - split class (used to reinstantiate the split object)
       - split-info (array of individual split meta-info)

   split-meta-info :
       - locations (hostnames where this split is local)
       - start offset (start in raw-bytes)
       - length (total bytes in the corresponding raw-bytes)
       - data-size : total data that will be processed in this split
  {code}
 ## with this new id, the jobclient upload job.xml, job.split, job.jar and 
achives/libs to a staging area (/user/_user-name_/.staging/_jobid_/). job.xml 
is staged to support (jobtracker.getJobFile()) api. 
 ## after the upload is done, the jobclient submits a job by passing job-id, 
job-conf and job-split-meta-info via rpc.
 ## jobtracker does the following things upon a submitjob request
  ### validate conf (includes queuecheck, acls checks etc along with user-name 
[conf.username and owner match]and ownership checks [called of getnewid() and 
submitjob()])
  ### serialize conf to mapred.system.dir/jobid/job.xml (for restarts)
  ### serialize split-meta-info to mapred.system.dir/jobid/job.split
  ### starts the job i.e create jobinprogress
 ## when a tt comes asking for a task, the jobtracker passes the split-metainfo 
(along with split-location and split-classname). Tasktracker uses this metainfo 
for reading the split raw-bytes. 
 ## tasktracker now localizes the job.jar from 
/user/_user-name_/.staging/_job-id_/job.jar and then unjars it. This is done 
using the job-conf (having user-credentials)
 ## mapred.system.dir can now be 700 and only accessible to mapred daemons 
 ## readFields() in jobconf caps the total characters in jobconf. This prevents 
users from passing huge job-confs. For now the limit is 3*1024*1024 chars
 ## job-split metainfo is also capped in readFields() to accept split meta-info 
< 10mb.
 ## since jobtracker.getNewJobId() maintains a mapping from jobid to username, 
the jobtracker needs to cleanup this mapping upon some timeout. One way to 
timeout is to use a thread which periodically cleans up this mapping.
 ## Upon job completion, jobcleanup code cleans up the staging folder i.e 
/user/_user-name_/.staging/_job-id_/.
 ## if the jobclient crashes or fails to submit job then the temp files 
/user/_user-name_/.staging/_job-id_/ are not deleted as this can be used for 
debugging purposes.

# Upon restart the mapred.system.dir can be completely trusted and hence no 
checking is done here.

> Secure job submission 
> ----------------------
>
>                 Key: MAPREDUCE-181
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>         Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to