[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044144#comment-14044144
 ] 

Chris Douglas commented on MAPREDUCE-5890:
------------------------------------------

bq. I am trying to trade off that complexity in software with an admin 
prerequisite to install one or few disks/partitions that selective users can 
chose to use via their job-configuration.

This would work also, but (Alejandro/Arun, correct me if this is mistaken) 
encrypted intermediate data is probably motivated by compliance regimes that 
require it. An audit would need to verify that every job used the encrypted 
local dirs, that those mounts were configured to encrypt when the job ran, etc. 
One would also need to do capacity planning for encrypted vs unencrypted space 
across nodes, possibly even federating jobs. It's workable, but kind of ad hoc. 
In contrast, verifying that the MR job set this switch is straightforward and 
has no ops overhead. I have no idea whether it's common to combine these 
workloads, but this would make it easier.

It's not so inconsistent to add this to MapReduce... frameworks are currently 
responsible for intra-application security, particularly RPC. If there's a 
general mechanism then this should use it. If that layer were developed, we'd 
want MapReduce to use it instead of its own, custom encryption. Today, the 
alternative is to develop that general-purpose layer.

To reduce the overhead, this could use the plugin mechanism in MAPREDUCE-2454 
because this no longer requires any changes to the {{ShuffleHandler}} or index 
formats. I haven't looked at the latest patch, but if the {{IFile}} format 
omits the 16 byte IV for each spill, then the only overhead it's adding is for 
the checks in the config (most of which can be pulled into the buffer init and 
cached).

Has this been tested in a cluster? Would the perf hit be simple to measure?

> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>              Labels: encryption
>         Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
> MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
> MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
> org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
> syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not 
> sufficient, it is required that while at rest it should be encrypted. 
> HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem 
> using Hadoop FileSystem API. MapReduce intermediate data and spills should 
> also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to