[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044144#comment-14044144 ]
Chris Douglas commented on MAPREDUCE-5890: ------------------------------------------ bq. I am trying to trade off that complexity in software with an admin prerequisite to install one or few disks/partitions that selective users can chose to use via their job-configuration. This would work also, but (Alejandro/Arun, correct me if this is mistaken) encrypted intermediate data is probably motivated by compliance regimes that require it. An audit would need to verify that every job used the encrypted local dirs, that those mounts were configured to encrypt when the job ran, etc. One would also need to do capacity planning for encrypted vs unencrypted space across nodes, possibly even federating jobs. It's workable, but kind of ad hoc. In contrast, verifying that the MR job set this switch is straightforward and has no ops overhead. I have no idea whether it's common to combine these workloads, but this would make it easier. It's not so inconsistent to add this to MapReduce... frameworks are currently responsible for intra-application security, particularly RPC. If there's a general mechanism then this should use it. If that layer were developed, we'd want MapReduce to use it instead of its own, custom encryption. Today, the alternative is to develop that general-purpose layer. To reduce the overhead, this could use the plugin mechanism in MAPREDUCE-2454 because this no longer requires any changes to the {{ShuffleHandler}} or index formats. I haven't looked at the latest patch, but if the {{IFile}} format omits the 16 byte IV for each spill, then the only overhead it's adding is for the checks in the config (most of which can be pulled into the buffer init and cached). Has this been tested in a cluster? Would the perf hit be simple to measure? > Support for encrypting Intermediate data and spills in local filesystem > ----------------------------------------------------------------------- > > Key: MAPREDUCE-5890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: security > Affects Versions: 2.4.0 > Reporter: Alejandro Abdelnur > Assignee: Arun Suresh > Labels: encryption > Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, > MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, > MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, > org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, > syslog.tar.gz > > > For some sensitive data, encryption while in flight (network) is not > sufficient, it is required that while at rest it should be encrypted. > HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem > using Hadoop FileSystem API. MapReduce intermediate data and spills should > also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)