Hi folks, With Yahoo's latest security release on github ( http://github.com/yahoo/hadoop-common/tree/yahoo-hadoop-0.20.104), it looks like we now have a real-world usable version of secure Hadoop, based on 0.20. This is exciting stuff, because now we have something solid to start working towards implementing similar security controls in HBase (HBASE-1697, HBASE-2014, HBASE-2016, HBASE-2420)!
However, this is going to be a large undertaking, with a strong dependency on the secure Hadoop branch (more on that in a bit -- unfortunately the fragmented hadoop-0.20 world is already leaking through). So I'd like to propose a feature branch in the HBase svn repo for security work, to: 1) ensure that changes towards implementing secure HBase have an ASF home 2) provide more visibility and granularity for review (esp. JIRA & reviewboard usage) 3) ease interaction/integration with other branched changes underway (master rewrite) I've already started pushing some preliminary changes up to github ( http://github.com/ghelmling/hbase/tree/security), and will continue to do so, but I'd like to avoid both massive patch sets accumulating too many changes and making interested committers & contributors go digging to see what the current state is. On the secure Hadoop branch dependency -- I've integrated the org.apache.hadoop.ipc changes into o.a.h.hbase.ipc.* (HBASE-2742) and run into a couple complications: * Hadoop RPC version rolled from 3 to 4 (apparently 0.20-append also does this!) * various bits in the updated HBaseClient, HBaseServer, etc. now depend on the security implementation, so building and running on top of non-secure Hadoop will not be possible. I'd like to post the diff on review.hbase.org for more review and feedback, but that begs the question of where the changes should go? Longer term, I think we need to dump Hadoop RPC (AVRO-405 seems promising in this) so that HBase internals aren't so intertwined with Hadoop implementation details, but that's it's own large scale project which we shouldn't couple to security. So, to sum up, thoughts on: a) creating a "security" feature branch in svn? b) RPC related changes, specifically cross Hadoop branch incompatibility due to version increment and Hadoop security dependencies? Thanks, Gary