On Wed, Jan 27, 2016 at 11:31 AM, Owen O'Malley <omal...@apache.org> wrote:
> I believe encryption is becoming a core part of Hadoop. I think that moving > core components out of Hadoop is bad from a project management perspective. > Although it's certainly true that encryption capabilities (in HDFS, YARN, etc.) are becoming core to Hadoop, I don't think that should really influence whether or not the non-Hadoop-specific encryption routines should be part of the Hadoop code base, or part of the code base of another project that Hadoop depends on. If Chimera had existed as a library hosted at ASF when HDFS encryption was first developed, HDFS probably would have just added that as a dependency and been done with it. I don't think we would've copy/pasted the code for Chimera into the Hadoop code base. > To put it another way, a bug in the encryption routines will likely become > a security problem that security@hadoop needs to hear about. > I don't think > adding a separate project in the middle of that communication chain is a > good idea. The same applies to data corruption problems, and so on... > Isn't the same true of all the libraries that Hadoop currently depends upon? If the commons-httpclient library (or commons-codec, or commons-io, or guava, or...) has a security vulnerability, we need to know about it so that we can update our dependency to a fixed version. This case doesn't seem materially different than that. > > > > It may be good to keep at generalized place(As in the > > discussion, we thought that place could be Apache Commons). > > > Apache Commons is a collection of *Java* projects, so Chimera as a > JNI-based library isn't a natural fit. > Could very well be that Apache Commons's charter would preclude Chimera. You probably know better than I do about that. > Furthermore, Apache Commons doesn't > have its own security list so problems will go to the generic > secur...@apache.org. > That seems easy enough to remedy, if they wanted to, and besides I'm not sure why that would influence this discussion. In my experience projects that don't have a separate security@project.a.o mailing list tend to just handle security issues on their private@project.a.o mailing list, which seems fine to me. > > Why do you think that Apache Commons is a better home than Hadoop? > I'm certainly not at all wedded to Apache Commons, that just seemed like a natural place to put it to me. Could be that a brand new TLP might make more sense. I *do* think that if other non-Hadoop projects want to make use of Chimera, which as I understand it is the goal which started this thread, then Chimera should exist outside of Hadoop so that: a) Projects that have nothing to do with Hadoop can just depend directly on Chimera, which has nothing Hadoop-specific in there. b) The Hadoop project doesn't have to export/maintain/concern itself with yet another publicly-consumed interface. c) Chimera can have its own (presumably much faster) release cadence completely separate from Hadoop. -- Aaron T. Myers Software Engineer, Cloudera