[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759337#action_12759337
 ] 

Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------

Summarizing some offline discussions:
1. Performance issues to do with 1.5 extra round trips to the TaskTracker for 
HTTP Digest authentication could be a significant cost when the map outputs are 
small.
2. Instead of that, can we do the following:
   2.1. Tasks authenticate to the TaskTrackers by simply passing the key in the 
URL. This doesn't cost us anything.
   2.2. Map tasks encrypts the final spill file on the map side when they are 
written to disk (and reducers decrypt them). This could be done using a key 
different from the shuffle key used in 2.1.
The idea is that at some point we anyway should have encrypted map outputs to 
have maximum security for the intermediate outputs. We can do that on-the-wire 
via https, or, have encrypted files. The latter should be much less costly when 
compared with the former. The point of having both 2.1 and 2.2 is to make the 
transfer very secure without introducing overheads to do with extra round trips 
for (digest) authentication.

Thoughts?

> Shuffle should be secure
> ------------------------
>
>                 Key: MAPREDUCE-1026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: security
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should 
> require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to