[thanks appreciate your doing that, the announcement itself was cross-posted as outreach]
Thanks Cos. As I see the work currently, I believe most, if not all of these, will be work against JIRAs in individual projects similar to the JIRAs posted here https://github.com/intel-hadoop/project-rhino. If we get to a point where some of the future work needs a home outside of the individual projects, happy to incubate that work in Apache. ~avik On Mon, Feb 25, 2013 at 4:18 PM, Konstantin Boudnik <c...@apache.org> wrote: > [yanking away most of the cross-posts...] > > An interesting cross component project Avik. Any plans to incubate it in > Apache? > > Cos > > On Mon, Feb 25, 2013 at 11:46PM, Dey, Avik wrote: > > Project Rhino > > > > As the Apache Hadoop ecosystem extends into new markets and sees new use > > cases with security and compliance challenges, the benefits of processing > > sensitive and legally protected data with Hadoop must be coupled with > > protection for private information that limits performance impact. > Project > > Rhino<https://github.com/intel-hadoop/project-rhino/> is our open source > > effort to enhance the existing data protection capabilities of the Hadoop > > ecosystem to address these challenges, and contribute the code back to > > Apache. > > > > The core of the Apache Hadoop ecosystem as it is commonly understood is: > > > > - Core: A set of shared libraries > > - HDFS: The Hadoop filesystem > > - MapReduce: Parallel computation framework > > - ZooKeeper: Configuration management and coordination > > - HBase: Column-oriented database on HDFS > > - Hive: Data warehouse on HDFS with SQL-like access > > - Pig: Higher-level programming language for Hadoop computations > > - Oozie: Orchestration and workflow management > > - Mahout: A library of machine learning and data mining algorithms > > - Flume: Collection and import of log and event data > > - Sqoop: Imports data from relational databases > > > > These components are all separate projects and therefore cross cutting > concerns like authN, authZ, a consistent security policy framework, > consistent authorization model and audit coverage are loosely coordinated. > Some security features expected by our customers, such as encryption, are > simply missing. Our aim is to take a full stack view and work with the > individual projects toward consistent concepts and capabilities, filling > gaps as we go. > > > > Our initial goals are: > > > > 1) Framework support for encryption and key management > > > > There is currently no framework support for encryption or key > management. We will add this support into Hadoop Core and integrate it > across the ecosystem. > > > > 2) A common authorization framework for the Hadoop ecosystem > > > > Each component currently has its own authorization engine. We will > abstract the common functions into a reusable authorization framework with > a consistent interface. Where appropriate we will either modify an existing > engine to work within this framework, or we will plug in a common default > engine. Therefore we also must normalize how security policy is expressed > and applied by each component. Core, HDFS, ZooKeeper, and HBase currently > support simple access control lists (ACLs) composed of users and groups. We > see this as a good starting point. Where necessary we will modify > components so they each offer equivalent functionality, and build support > into others. > > > > 3) Token based authentication and single sign on > > > > Core, HDFS, ZooKeeper, and HBase currently support Kerberos > authentication at the RPC layer, via SASL. However this does not provide > valuable attributes such as group membership, classification level, > organizational identity, or support for user defined attributes. Hadoop > components must interrogate external resources for discovering these > attributes and at scale this is problematic. There is also no consistent > delegation model. HDFS has a simple delegation capability, and only Oozie > can take limited advantage of it. We will implement a common token based > authentication framework to decouple internal user and service > authentication from external mechanisms used to support it (like Kerberos). > > > > 4) Extend HBase support for ACLs to the cell level > > > > Currently HBase supports setting access controls at the table or column > family level. However, many use cases would benefit from the additional > capability to do this on a per cell basis. In fact for many users dealing > with sensitive information the ability to do this is crucial. > > > > 5) Improve audit logging > > > > Audit messages from various Hadoop components do not use a unified or > even consistently formatted format. This makes analysis of logs for > verifying compliance or taking corrective action difficult. We will build a > common audit logging facility as part of the common authorization framework > work. We will also build a set of common audit log processing tools for > transforming them to different industry standard formats, for supporting > compliance verification, and for triggering responses to policy violations. > > > > Current JIRAs: > > > > As part of this ongoing effort we are contributing our work to-date > against the JIRAs listed below. As you may appreciate, the goals for > Project Rhino covers a number of different Apache projects, the scope of > work is significant and likely to only increase as we get additional > community input. We also appreciate that there may be others in the Apache > community that may be working on some of this or are interested in > contributing to it. If so, we look forward to partnering with you in Apache > to accelerate this effort so the Apache community can see the benefits from > our collective efforts sooner. You can also find a more detailed version of > this announcement at Project Rhino< > https://github.com/intel-hadoop/project-rhino/>. > > > > Please feel free to reach out to us by commenting on the JIRAs below: > > > > HBASE-6222: Add per-KeyValue Security< > https://issues.apache.org/jira/browse/hbase-6222> > > > > HADOOP-9331: Hadoop crypto codec framework and crypto codec > implementations<https://issues.apache.org/jira/browse/hadoop-9331> and > related sub-tasks > > > > MAPREDUCE-5025: Key Distribution and Management for supporting crypto > codec in Map Reduce<https://issues.apache.org/jira/browse/mapreduce-5025> > and related JIRAs > > > > HBASE-7544: Transparent table/CF encryption< > https://issues.apache.org/jira/browse/hbase-7544> > > >