[ 
https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370631#comment-14370631
 ] 

liyunzhang_intel edited comment on SPARK-5682 at 3/23/15 1:31 AM:
------------------------------------------------------------------

Hi all:
  There are two methods to not use encrypted classes like 
 CryptoInputStream.java provided in hadoop2.6:
*  Isolate code like 
 CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated 
lib and put it to maven repository and let other projects to depend on.
* Write CryptoInputStream/CryptoOutputStream and so on in spark code. 
 
 Both method has its advantages and disadvantages:
*  Method1: 
    Disadvantage:It need hadoop project or spark community to review the code 
in the seperated lib.
  After all the code is finished reviewed and the seperated lib has been put to 
maven repository, we will introduce it to spark code. Maybe it need much time.
    Advantage: After the recognition of hadoop or spark community, we can 
ensure the quality of the code. If  some fixes about crypto classes are made, 
someone update the seperated lib and then we modify the maven dependence in 
spark.
*  Method2:
    Disadvantage: We need keep an eye on the later fixes about crypto classes 
are made in later hadoop release. If some changes, we need update the code in 
scala.
  Advantage: No dependance to other lib. It's convenient for us to make some 
changes if it is really needed in spark.

For method1, my teammate is working on it. For method2, the code in the pull 
request is finished and are waited to review. Can anyone give me some advices?




was (Author: kellyzly):
Hi all:
  There are two methods to not use encrypted classes like 
 CryptoInputStream.java provided in hadoop2.6:
*  Isolagte code like 
 CryptoInputStream/CryptoOutputStream from hadoop source code to a seperated 
lib and put it to maven repository and let other projects to depend on.
* Write CryptoInputStream/CryptoOutputStream and so on in spark code. 
 
 Both method has its advantages and disadvantages:
*  Method1: 
    Disadvantage:It need hadoop project or spark community to review the code 
in the seperated lib.
  After all the code is finished reviewed and the seperated lib has been put to 
maven repository, we will introduce it to spark code. Maybe it need much time.
    Advantage: After the recognition of hadoop or spark community, we can 
ensure the quality of the code. If  some fixes about crypto classes are made, 
someone update the seperated lib and then we modify the maven dependence in 
spark.
*  Method2:
    Disadvantage: We need keep an eye on the later fixes about crypto classes 
are made in later hadoop release. If some changes, we need update the code in 
scala.
  Advantage: No dependance to other lib. It's convenient for us to make some 
changes if it is really needed in spark.

For method1, my teammate is working on it. For method2, the code in the pull 
request is finished and are waited to review. Can anyone give me some advices?



> Add encrypted shuffle in spark
> ------------------------------
>
>                 Key: SPARK-5682
>                 URL: https://issues.apache.org/jira/browse/SPARK-5682
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle
>            Reporter: liyunzhang_intel
>         Attachments: Design Document of Encrypted Spark 
> Shuffle_20150209.docx, Design Document of Encrypted Spark 
> Shuffle_20150318.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle 
> data safer. This feature is necessary in spark. We reuse hadoop encrypted 
> shuffle feature to spark and because ugi credential info is necessary in 
> encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn 
> framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to