[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyunzhang_intel updated SPARK-5682: ------------------------------------ Attachment: Design Document of Encrypted Spark Shuffle_20150318.docx [~srowen], i have submitted new design doc-Design Document of Encrypted Spark Shuffle_20150318 and also submitted newest code to pull request. In this submit, following big changes are made: * Delete hadoop2.6 profile. We don't depend on hadoop 2.6 because I add crypto classes like CryptoInputStream.scala,CryptoOutputStream.scala and so on in core module org.apache.Spark.crypto package. * AES is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl provides. In current code, we only implement JceAesCtrypoCodec and will implement OpensslAesCtrCryptoCodec later. How to test? * download code from https://github.com/kellyzly/spark * build : mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests * when need enable encrypted shuffle, add following in conf/spark-defaults.conf spark.encrypted.shuffle true spark.job.encrypted-intermediate-data true spark.security.crypto.cipher.suite AES/CTR/NoPadding spark.security.crypto.codec.classes.aes.ctr.nopadding org.apache.spark.crypto.JceAesCtrCryptoCodec * start master and work: sbin/start-all.sh * edit SparkPi source code to worldcount, run wordcount ** ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar ** ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --num-executors 3 --driver-memory 1g --executor-memory 1g --executor-cores 1 examples/target/my.spark-examples_2.10-1.3.0-SNAPSHOT.jar > Add encrypted shuffle in spark > ------------------------------ > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle > Reporter: liyunzhang_intel > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. We reuse hadoop encrypted > shuffle feature to spark and because ugi credential info is necessary in > encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn > framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org