On Thu, May 27, 2010 at 6:58 AM, Arv Mistry <[email protected]> wrote: > Thanks for responding Ted. I did see that link before but there wasn't enough > details there for me to make sense of it. I'm not sure who Owen is ;(
I'm Owen, although I think I've used at least 5 different email addresses on these lists at various times. *smile* Since you specify 0.20, you'd probably want to put your keys in to HDFS and read it from the tasks. Note that this is *not* secure and other users of your cluster can access your data in HDFS with only a tiny bit of mis-direction. (This will be fixed in 0.22, where we are adding strong authentication based on Kerberos.) The next step would be to define a compression codec that does the encryption. So let's say you define a XorEncryption that does a simple xor with a byte. (Obviously, you would use something better than xor, it is just an example!) XorEncryption would need to implement org.apache.hadoop.io.compression.CompressionCodec. You'd also need add your new class to the list of codecs in the configuration variable io.compression.codecs. For details of how to configure your mapreduce job with compression (or in this case encryption), look at http://bit.ly/9PMHUA. If XorEncryption returned ".xor" getDefaultExtension, then any file that ended in .xor would automatically be put through the encryption. So input is automatically handled. You need to define some configuration variables to get it applied to the output of MapReduce. -- Owen
