[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413498#comment-13413498
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4417:
-----------------------------------------------

When looking at encryption on the wire for the shuffle the alternatives that 
popped up where transport encryption (HTTPS) and data/spills encryption (doable 
via a codec).

Using HTTPS requires improving the Fetcher/ShuffleHandler (Netty/JDK-URL) to 
use HTTPS and configuring certificates. It is a well understood/standard/proven 
technology and gives you end to end confidentiality, integrity, server 
authentication (and optionally client authentication), in an out of box manner 
without room to get things wrong. The server certificates private keys are out 
of reach from job tasks (they are used by the NM, similar to Kerberos keytabs). 

Using a codec, requires (leveraging a existing plugin point) a compression 
codec implementation that adds cipher-streams wrappers to the original streams 
and in addition could delegate to a real compression codec (in order not to 
lose compression if doing encryption). This requires us choosing a Cipher 
implementation by hand (which I'm not an expert on) and I'm not sure which one 
would be the best choice and what are the weaknesses of each one of them 
(http://en.wikipedia.org/wiki/Stream_cipher#Comparison_Of_Stream_Ciphers). 
Using a cipher on its own will provide confidentiality but it would not provide 
integrity or man-in-the-middle protection (unless we end up implementing 
something like TLS). In addition, both ends are controlled by job tasks, thus 
it becomes the responsibility of the user to create/distribute/protect the 
secrets that are basis of confidentiality. In addition, with the codec approach 
the HTTP shuffle requests/response headers go in the clear which could enable a 
man-in-the-middle attach.

                
> add support for encrypted shuffle
> ---------------------------------
>
>                 Key: MAPREDUCE-4417
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4417
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2, security
>    Affects Versions: 2.0.0-alpha
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>             Fix For: 2.0.1-alpha
>
>
> Currently Shuffle fetches go on the clear. While Kerberos provides 
> comprehensive authentication for the cluster, it does not provide 
> confidentiality. 
> When processing sensitive data confidentiality may be desired (at the expense 
> of job performance and resources utilization for doing encryption).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to