[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611547#comment-14611547
 ] 

liyunzhang_intel commented on SPARK-5682:
-----------------------------------------

[~hujiayin]: thanks for your comment.

This feature is not based on hadooop2.6.  it is based on hadoop2.6 in original 
design. In the latest design doc(20150506), It shows that now there are two 
ways to implement encrypted shuffle in spark. Currently we only implement it on 
spark-on-yarn framework.  One is based on [Chimera(Chimera is a project which 
strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to 
facilitate AES-NI based data encryption in other 
projects)|https://github.com/intel-hadoop/chimera](see 
https://github.com/apache/spark/pull/5307). In the other way,we implement all 
the crypto classes like CryptoInputStream/CryptoOutputStream in scala under 
core/src/main/scala/org/apache/spark/crypto/ package(see 
https://github.com/apache/spark/pull/4491).

For the problem of importing hadoop api in spark, if the interface of hadoop 
class is public and stable,it can be use in spark.
in 
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/classification/InterfaceStability.html,
 it says:
{quote}
Incompatible changes must not be made to classes marked as stable.
{quote}
which means when a class is marked stable, later release will not change it.





> Add encrypted shuffle in spark
> ------------------------------
>
>                 Key: SPARK-5682
>                 URL: https://issues.apache.org/jira/browse/SPARK-5682
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle
>            Reporter: liyunzhang_intel
>         Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx, Design Document of Encrypted Spark 
> Shuffle_20150402.docx, Design Document of Encrypted Spark 
> Shuffle_20150506.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. AES  is a specification for 
> the encryption of electronic data. There are 5 common modes in AES. CTR is 
> one of the modes. We use two codec JceAesCtrCryptoCodec and 
> OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used 
> in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk 
> provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl 
> provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we 
> first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to