[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated MAPREDUCE-5890:
-----------------------------------

    Attachment: MAPREDUCE-5890.5.patch


Hi [~chris.douglas]
Thank you for the feedback. Updating the patch to address most of your nits :

bq. There are many counterexamples, but running a MR job is a heavy way to test 
this

Agreed. But a MR Job will ensure all code paths are handled. I will be adding 
testcases to existing classs (for eg. TestMerger) to validate that the merging 
works fine with Shuffle turned on. But I don't see too many tests cases to 
validate that mapOutput spillfiles are correctly being partitioned and sent to 
the correct reduces.

bq. Has this been tested on spills with intermediate merges? With more than a 
single reduce? Looking at the patch, it looks like it creates the stream with 
the IV, it doesn't reset the IV for each segment (apologies, I haven't tried 
applying it, so I might just be misreading the context).

Modifying the TestMerger class to use the CryptoShuffle will ensure the former. 
The current Test case Included with the patch tests with multiple reducers.. I 
will refactor it a bit to explicitly test these scenarios

bq. To make it backwards compatible, the IV can be part of each IFile segment 
(requiring no changes to ShuffleHandler or the SpillRecord/IndexRecord format), 
or the IVs can be added to the end of the SpillRecord. In the latter case, the 
Fetcher will need to request that the alternate interpretation by including a 
header; old versions will get the existing interpretation of the SpillRecord.

As per your suggestion, I was actually able to get the end to end flow working 
without having to touch {{ShuffleHandler}}, {{ShuffleHeader}} or 
{{IndexRecord}}. Although, what I did was add the IV to the prefix of an 
{{IFile}} before it is written.. and during {{Segment::init()}} when it is read 
from disk. Only nit is I have to do some amount of book-keeping on the 
{{MapTask}} and {{Fetcher}} to add/remove the 16 bytes.

bq. Since the IV size is hard-coded in CryptoUtils to 16 bytes (and part of the 
IndexRecord format), it should probably fail if the 
CryptoCodec::getAlgorithmBlockSize returns anything else.

Yup.. this would have been an issue had I had to modify the 
{{IndexRecord}}/{{ShuffleHeader}}. But now we don't, so this is not an issue 
anymore




> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>              Labels: encryption
>         Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
> MAPREDUCE-5890.5.patch, 
> org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
> syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not 
> sufficient, it is required that while at rest it should be encrypted. 
> HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem 
> using Hadoop FileSystem API. MapReduce intermediate data and spills should 
> also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to