I'm +1 on moving to a min version of Hadoop 1.0.0 in 0.96. The support-all-flavors stance, especially on branches as opposed to releases requires us to maintain shims for different versions thus requires us to expend energy managing this complexity instead of improving HBase's core.
I'm not convinced about the new user argument -- if folks are completely new, I'd imagine they'd most likely start by going with the herd and picking a DFS that most folks use (such as an apache hadoop1.0.0, a cdh version, or possibly a mapr version). In the case of cdh/mapr or internal custom build it would be the responsibility of the packager to maintain and support their own idiosyncrasies or limitations. I feel some sympathy towards the existing user argument (we have plenty to deal with) -- a compromise may be to have hbase core tested and focused on a small number of hdfs versions (apache hadoop 1.0.0 and apache hadoop 0.23.x are my first suggestions) and to have an interface that isolates all the the reflection checks that are currently sprinkled throughout the code base into an interface which can be targeted to support other specific HDFS/DFS flavors. This would be saner and could explicitly be tested. My guess is that this problem isn't just for the user/security API -- I believe there may be performance improvements and api improvements in newer HDFS's that we may want to take advantage of and would need reflection to be discovered as well. Jon. On Wed, Mar 7, 2012 at 11:42 AM, Mikhail Bautin < [email protected]> wrote: > The current support for multiple versions of HDFS is in my opinion actually > one of the strengths of HBase, and the project will lose that advantage if > we cut support for earlier versions of Hadoop. I think HBase should only > require the simplest possible universally available subset of HDFS API, and > security should be an optional feature, discovered through reflection or > enabled in some other ways. > > We have a custom version of Hadoop at Facebook that is not planning to > implement security any time soon. This version of Hadoop runs underneath > what we believe to be some of the largest existing production HBase > deployments. We are currently running the 0.89-fb version of HBase in > production, but are considering moving to a more recent version of HBase at > some point, and it would be great to be able to do that independently of > changing the underlying Hadoop distribution for migration complexity > reasons. Currently we are able to run public HBase trunk on our version of > Hadoop, but once in a while we have to satisfy new dependences on Hadoop > features that are added to HBase. If the changes proposed in this thread > happen, we would have to pull in a lot more security-related dependencies > into our version of Hadoop and, most likely, implement a lot of no-op > stubs. However, that may not be a trivial project, and it certainly would > not add any clarity or value to our Hadoop codebase or HBase / HDFS > interaction. > > I imagine there are other custom flavors of Hadoop out there where HBase > support would be desirable. For example, does MapR implement the same > security API as Hadoop 1.0.0 does? Restricting HBase to a smaller subset of > Hadoop versions complicates life for existing users, and makes HBase a less > likely choice for new users, who could go with something like Hypertable > where they have an extra abstraction layer between the database and the > underlying distributed file system implementation. > > Thanks, > --Mikhail > > On Wed, Mar 7, 2012 at 10:20 AM, Devaraj Das <[email protected]> wrote: > > > Given that the token/ugi APIs are being used in other ecosystem > components > > too (like Hive, HCatalog & Oozie), and in general, that security model > will > > probably hold for other projects too, I think that its not an unfair > > expectation from Hadoop that it should maintain compatibility on > UGI/Token* > > interfaces (*smile*). > > > > On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote: > > > > > Andy - could you please start a discussion? > > > > > > We could, at the very least, mark UGI as LimitedPrivate for HBase and > > work with you guys to maintain compatibility for the future. Makes sense? > > > > > > thanks, > > > Arun > > > > > > On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote: > > > > > >> After that, I believe we can merge the security sources in. However we > > may have an issue going forward because UGI is an unstable/private API. > > Needs sorting out with core at some point. > > >> > > >> Best regards, > > >> > > >> - Andy > > >> > > >> > > >> On Mar 6, 2012, at 9:55 AM, Stack <[email protected]> wrote: > > >> > > >>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <[email protected]> > > wrote: > > >>>> ...however we can't easily build a single artifact because the > secure > > RPC engine, as it interacts with the Hadoop auth framework, must use > > UserGroupInformation. > > >>>> > > >>> > > >>> OK. So security story needs a bit of work. Sounds like we have > > >>> enough votes though to require hadoop 1.0.0 at least in 0.96. > > >>> > > >>> St.Ack > > > > > > -- > > > Arun C. Murthy > > > Hortonworks Inc. > > > http://hortonworks.com/ > > > > > > > > > > > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
