[
https://issues.apache.org/jira/browse/MAPREDUCE-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413498#comment-13413498
]
Alejandro Abdelnur commented on MAPREDUCE-4417:
-----------------------------------------------
When looking at encryption on the wire for the shuffle the alternatives that
popped up where transport encryption (HTTPS) and data/spills encryption (doable
via a codec).
Using HTTPS requires improving the Fetcher/ShuffleHandler (Netty/JDK-URL) to
use HTTPS and configuring certificates. It is a well understood/standard/proven
technology and gives you end to end confidentiality, integrity, server
authentication (and optionally client authentication), in an out of box manner
without room to get things wrong. The server certificates private keys are out
of reach from job tasks (they are used by the NM, similar to Kerberos keytabs).
Using a codec, requires (leveraging a existing plugin point) a compression
codec implementation that adds cipher-streams wrappers to the original streams
and in addition could delegate to a real compression codec (in order not to
lose compression if doing encryption). This requires us choosing a Cipher
implementation by hand (which I'm not an expert on) and I'm not sure which one
would be the best choice and what are the weaknesses of each one of them
(http://en.wikipedia.org/wiki/Stream_cipher#Comparison_Of_Stream_Ciphers).
Using a cipher on its own will provide confidentiality but it would not provide
integrity or man-in-the-middle protection (unless we end up implementing
something like TLS). In addition, both ends are controlled by job tasks, thus
it becomes the responsibility of the user to create/distribute/protect the
secrets that are basis of confidentiality. In addition, with the codec approach
the HTTP shuffle requests/response headers go in the clear which could enable a
man-in-the-middle attach.
> add support for encrypted shuffle
> ---------------------------------
>
> Key: MAPREDUCE-4417
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4417
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: mrv2, security
> Affects Versions: 2.0.0-alpha
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Fix For: 2.0.1-alpha
>
>
> Currently Shuffle fetches go on the clear. While Kerberos provides
> comprehensive authentication for the cluster, it does not provide
> confidentiality.
> When processing sensitive data confidentiality may be desired (at the expense
> of job performance and resources utilization for doing encryption).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira