On Mon, Sep 13, 2010 at 9:31 AM, Owen O'Malley <omal...@apache.org> wrote:
> Moving the discussion over to the more appropriate mapreduce-dev. > This is not MR-specific, since the strangely named hadoop.job.ugi determines HDFS permissions as well. +CC hdfs-dev... though I actually think this is an issue that users will have interest in, which is why I posted to general initially rather than a dev list. > On Mon, Sep 13, 2010 at 9:08 AM, Todd Lipcon <t...@cloudera.com> wrote: > > > 1) Groups resolution happens on the server side, where it used to happen > on > > the client. Thus, all Hadoop users must exist on the NN/JT machines in > order > > for group mapping to succeed (or the user must write a custom group > mapper). > > There is a plugin that performs the group lookup. See HADOOP-4656. > There is no requirement for having the user accounts on the NN/JT > although that is the easiest approach. It is not recommended that the > users be allowed to login. > "or the user must write a custom group mapper" above refers to this plugin capability. But I think most users do not want to spend the time to write (or even setup) such a plugin beyond the default shell-based mapping service. > I think it is important that turning security on and off doesn't > drastically change the semantics or protocols. That will become much > much harder to support downstream. > > As someone who spends an awful lot of time doing downstream support of lots of different clusters, I actually disagree. I believe the majority of users do *not* plan on turning on security, so keeping things simpler for them is worth a lot. In many of these clusters the users and the ops team and the developers are all one and the same - it's not the multitenant "internal service" model that we see at the larger installations like Yahoo or Facebook. > > 2) The hadoop.job.ugi parameter is ignored - instead the user has to use > the > > new UGI.createRemoteUser("foo").doAs() API, even in simple security. > > User code that counts on hadoop.job.ugi working will be horribly > broken once you turn on security. Turning on and off security should > not involve testing all of your applications. It is unfortunate that > we ever used the configuration value as the user, but continuing to > support it will make our user's code much much more brittle. > The assumption above is "once you turn on security" - but many users will not and probably never will turn on security. Providing a transition plan for one version is our usual policy here - I agree that long term we would like to do away with this hack of a configuration parameter. Since it's not hard to provide a backwards compatibility path with a deprecation warning for one version, are you against it? Or just saying that on your particular clusters you will choose not to take advantage of it? -Todd -- Todd Lipcon Software Engineer, Cloudera