+1 to put the encryption codecs to part of Apache Commons, then other Apache projects like Spark can use it more conveniently.
On Thu, Jan 21, 2016 at 10:18 AM, Zheng, Kai <kai.zh...@intel.com> wrote: > Just a question. Becoming a separate jar/module in Apache Commons means > Chimera or the module can be released separately or in a timely manner, not > coupling with other modules for release in the project? Thanks. > > Regards, > Kai > > -----Original Message----- > From: Aaron T. Myers [mailto:a...@cloudera.com] > Sent: Thursday, January 21, 2016 9:44 AM > To: hdfs-dev@hadoop.apache.org > Subject: Re: Hadoop encryption module as Apache Chimera incubator project > > +1 for Hadoop depending upon Chimera, assuming Chimera can get > hosted/released under some Apache project umbrella. If that's Apache > Commons (which makes a lot of sense to me) then I'm also a big +1 on > Andrew's suggestion that we make it a separate module. > > Uma, would you be up for approaching the Apache Commons folks saying that > you'd like to contribute Chimera? I'd recommend saying that Hadoop and > Spark are both on board to depend on this. > > -- > Aaron T. Myers > Software Engineer, Cloudera > > On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > > > Thanks Uma for putting together this proposal. Overall sounds good to > > me, > > +1 for these improvements. A few comments/questions: > > > > * If it becomes part of Apache Commons, could we make Chimera a > > separate JAR? We have real difficulties bumping dependency versions > > right now, so ideally we don't need to bump our existing Commons > > dependencies to use Chimera. > > * With this refactoring, do we have confidence that we can get our > > desired changes merged and released in a timely fashion? e.g. if we > > find another bug like HADOOP-11343, we'll first need to get the fix > > into Chimera, have a new Chimera release, then bump Hadoop's Chimera > > dependency. This also relates to the previous point, it's easier to do > > this dependency bump if Chimera is a separate JAR. > > > > Best, > > Andrew > > > > On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma < > > uma.ganguma...@intel.com> > > wrote: > > > > > Hi Devs, > > > > > > Some of our Hadoop developers working with Spark community to > > > implement the shuffle encryption. While implementing that, they > > > realized some/most > > of > > > the code in Hadoop encryption code and their implemention code have > > > to > > be > > > duplicated. This leads to an idea to create separate library, named > > > it as Chimera (https://github.com/intel-hadoop/chimera). It is an > > > optimized cryptographic library. It provides Java API for both > > > cipher level and > > Java > > > stream level to help developers implement high performance AES > > > encryption/decryption with the minimum code and effort. Chimera was > > > originally based Hadoop crypto code but was improved and generalized > > > a > > lot > > > for supporting wider scope of data encryption needs for more > > > components > > in > > > the community. > > > > > > So, now team is thinking to make this library code as open source > > > project via Apache Incubation. Proposal is Chimera to join the > > > Apache as incubating or Apache commons for facilitating its adoption. > > > > > > In general this will get the following advantages: > > > 1. As Chimera embedded the native in jar (similar to Snappy java), > > > it solves the current issues in Hadoop that a HDFS client has to > > > depend libhadoop.so if the client needs to read encryption zone in > > > HDFS. This means a HDFS client may has to depend a Hadoop > > > installation in local machine. For example, HBase uses depends on > > > HDFS client jar other than a Hadoop installation and then has no > > > access to libhadoop.so. So HBase > > cannot > > > use an encryption zone or it cause error. > > > 2. Apache Spark shuffle and spill encryption could be another > > > example where we can use Chimera. We see the fact that the stream > > > encryption for Spark shuffle and spill doesn’t require a stream > > > cipher like AES/CTR, although the code shares the common > > > characteristics of a stream style > > API. > > > We also see the need of optimized Cipher for non-stream style use > > > cases such as network encryption such as RPC. These improvements > > > actually can > > be > > > shared by more projects of need. > > > > > > 3. Simplified code in Hadoop to use dedicated library. And drives > > > more improvements. For example, current the Hadoop crypto code API > > > is totally based on AES/CTR although it has cipher suite > configurations. > > > > > > AES/CTR is for HDFS data encryption at rest, but it doesn’t > > > necessary to be AES/CTR for all the cases such as Data transfer > > > encryption and intermediate file encryption. > > > > > > > > > > > > So, we wanted to check with Hadoop community about this proposal. > > > Please provide your feedbacks on it. > > > > > > Regards, > > > Uma > > > > > >