[
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun Suresh updated MAPREDUCE-5890:
-----------------------------------
Attachment: MAPREDUCE-5890.5.patch
Hi [~chris.douglas]
Thank you for the feedback. Updating the patch to address most of your nits :
bq. There are many counterexamples, but running a MR job is a heavy way to test
this
Agreed. But a MR Job will ensure all code paths are handled. I will be adding
testcases to existing classs (for eg. TestMerger) to validate that the merging
works fine with Shuffle turned on. But I don't see too many tests cases to
validate that mapOutput spillfiles are correctly being partitioned and sent to
the correct reduces.
bq. Has this been tested on spills with intermediate merges? With more than a
single reduce? Looking at the patch, it looks like it creates the stream with
the IV, it doesn't reset the IV for each segment (apologies, I haven't tried
applying it, so I might just be misreading the context).
Modifying the TestMerger class to use the CryptoShuffle will ensure the former.
The current Test case Included with the patch tests with multiple reducers.. I
will refactor it a bit to explicitly test these scenarios
bq. To make it backwards compatible, the IV can be part of each IFile segment
(requiring no changes to ShuffleHandler or the SpillRecord/IndexRecord format),
or the IVs can be added to the end of the SpillRecord. In the latter case, the
Fetcher will need to request that the alternate interpretation by including a
header; old versions will get the existing interpretation of the SpillRecord.
As per your suggestion, I was actually able to get the end to end flow working
without having to touch {{ShuffleHandler}}, {{ShuffleHeader}} or
{{IndexRecord}}. Although, what I did was add the IV to the prefix of an
{{IFile}} before it is written.. and during {{Segment::init()}} when it is read
from disk. Only nit is I have to do some amount of book-keeping on the
{{MapTask}} and {{Fetcher}} to add/remove the 16 bytes.
bq. Since the IV size is hard-coded in CryptoUtils to 16 bytes (and part of the
IndexRecord format), it should probably fail if the
CryptoCodec::getAlgorithmBlockSize returns anything else.
Yup.. this would have been an issue had I had to modify the
{{IndexRecord}}/{{ShuffleHeader}}. But now we don't, so this is not an issue
anymore
> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
> Key: MAPREDUCE-5890
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: security
> Affects Versions: 2.4.0
> Reporter: Alejandro Abdelnur
> Assignee: Arun Suresh
> Labels: encryption
> Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch,
> MAPREDUCE-5890.5.patch,
> org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt,
> syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not
> sufficient, it is required that while at rest it should be encrypted.
> HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem
> using Hadoop FileSystem API. MapReduce intermediate data and spills should
> also be encrypted while at rest.
--
This message was sent by Atlassian JIRA
(v6.2#6252)