[
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759337#action_12759337
]
Devaraj Das commented on MAPREDUCE-1026:
----------------------------------------
Summarizing some offline discussions:
1. Performance issues to do with 1.5 extra round trips to the TaskTracker for
HTTP Digest authentication could be a significant cost when the map outputs are
small.
2. Instead of that, can we do the following:
2.1. Tasks authenticate to the TaskTrackers by simply passing the key in the
URL. This doesn't cost us anything.
2.2. Map tasks encrypts the final spill file on the map side when they are
written to disk (and reducers decrypt them). This could be done using a key
different from the shuffle key used in 2.1.
The idea is that at some point we anyway should have encrypted map outputs to
have maximum security for the intermediate outputs. We can do that on-the-wire
via https, or, have encrypted files. The latter should be much less costly when
compared with the former. The point of having both 2.1 and 2.2 is to make the
transfer very secure without introducing overheads to do with extra round trips
for (digest) authentication.
Thoughts?
> Shuffle should be secure
> ------------------------
>
> Key: MAPREDUCE-1026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
> Project: Hadoop Map/Reduce
> Issue Type: Sub-task
> Components: security
> Reporter: Owen O'Malley
> Assignee: Devaraj Das
>
> Since the user's data is available via http from the TaskTrackers, we should
> require a job-specific secret to access it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.