Re: Hadoop encryption module as Apache Chimera incubator project

hitliuyi Wed, 20 Jan 2016 18:38:16 -0800

+1 to put the encryption codecs to part of Apache Commons, then other
Apache projects like Spark can use it more conveniently.


On Thu, Jan 21, 2016 at 10:18 AM, Zheng, Kai <kai.zh...@intel.com> wrote:

> Just a question. Becoming a separate jar/module in Apache Commons means
> Chimera or the module can be released separately or in a timely manner, not
> coupling with other modules for release in the project? Thanks.
>
> Regards,
> Kai
>
> -----Original Message-----
> From: Aaron T. Myers [mailto:a...@cloudera.com]
> Sent: Thursday, January 21, 2016 9:44 AM
> To: hdfs-dev@hadoop.apache.org
> Subject: Re: Hadoop encryption module as Apache Chimera incubator project
>
> +1 for Hadoop depending upon Chimera, assuming Chimera can get
> hosted/released under some Apache project umbrella. If that's Apache
> Commons (which makes a lot of sense to me) then I'm also a big +1 on
> Andrew's suggestion that we make it a separate module.
>
> Uma, would you be up for approaching the Apache Commons folks saying that
> you'd like to contribute Chimera? I'd recommend saying that Hadoop and
> Spark are both on board to depend on this.
>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
> On Wed, Jan 20, 2016 at 4:31 PM, Andrew Wang <andrew.w...@cloudera.com>
> wrote:
>
> > Thanks Uma for putting together this proposal. Overall sounds good to
> > me,
> > +1 for these improvements. A few comments/questions:
> >
> > * If it becomes part of Apache Commons, could we make Chimera a
> > separate JAR? We have real difficulties bumping dependency versions
> > right now, so ideally we don't need to bump our existing Commons
> > dependencies to use Chimera.
> > * With this refactoring, do we have confidence that we can get our
> > desired changes merged and released in a timely fashion? e.g. if we
> > find another bug like HADOOP-11343, we'll first need to get the fix
> > into Chimera, have a new Chimera release, then bump Hadoop's Chimera
> > dependency. This also relates to the previous point, it's easier to do
> > this dependency bump if Chimera is a separate JAR.
> >
> > Best,
> > Andrew
> >
> > On Mon, Jan 18, 2016 at 11:46 PM, Gangumalla, Uma <
> > uma.ganguma...@intel.com>
> > wrote:
> >
> > > Hi Devs,
> > >
> > >   Some of our Hadoop developers working with Spark community to
> > > implement the shuffle encryption. While implementing that, they
> > > realized some/most
> > of
> > > the code in Hadoop encryption code and their  implemention code have
> > > to
> > be
> > > duplicated. This leads to an idea to create separate library, named
> > > it as Chimera (https://github.com/intel-hadoop/chimera). It is an
> > > optimized cryptographic library. It provides Java API for both
> > > cipher level and
> > Java
> > > stream level to help developers implement high performance AES
> > > encryption/decryption with the minimum code and effort. Chimera was
> > > originally based Hadoop crypto code but was improved and generalized
> > > a
> > lot
> > > for supporting wider scope of data encryption needs for more
> > > components
> > in
> > > the community.
> > >
> > > So, now team is thinking to make this library code as open source
> > > project via Apache Incubation.  Proposal is Chimera to join the
> > > Apache as incubating or Apache commons for facilitating its adoption.
> > >
> > > In general this will get the following advantages:
> > > 1. As Chimera embedded the native in jar (similar to Snappy java),
> > > it solves the current issues in Hadoop that a HDFS client has to
> > > depend libhadoop.so if the client needs to read encryption zone in
> > > HDFS. This means a HDFS client may has to depend a Hadoop
> > > installation in local machine. For example, HBase uses depends on
> > > HDFS client jar other than a Hadoop installation and then has no
> > > access to libhadoop.so. So HBase
> > cannot
> > > use an encryption zone or it cause error.
> > > 2. Apache Spark shuffle and spill encryption could be another
> > > example where we can use Chimera. We see the fact that the stream
> > > encryption for Spark shuffle and spill doesn’t require a stream
> > > cipher like AES/CTR, although the code shares the common
> > > characteristics of a stream style
> > API.
> > > We also see the need of optimized Cipher for non-stream style use
> > > cases such as network encryption such as RPC. These improvements
> > > actually can
> > be
> > > shared by more projects of need.
> > >
> > > 3. Simplified code in Hadoop to use dedicated library. And drives
> > > more improvements. For example, current the Hadoop crypto code API
> > > is totally based on AES/CTR although it has cipher suite
> configurations.
> > >
> > > AES/CTR is for HDFS data encryption at rest, but it doesn’t
> > > necessary to be AES/CTR for all the cases such as Data transfer
> > > encryption and intermediate file encryption.
> > >
> > >
> > >
> > >  So, we wanted to check with Hadoop community about this proposal.
> > > Please provide your feedbacks on it.
> > >
> > > Regards,
> > > Uma
> > >
> >
>

Re: Hadoop encryption module as Apache Chimera incubator project

Reply via email to