+1 for pramod's proposal. On 19-May-2017 4:51 AM, "Sanjay Pujare" <san...@datatorrent.com> wrote:
> +1 for Pramod's proposal for impersonation. > > I have an issue with Sandesh's suggestion about making the new behavior as > the default (or only) behavior. This will introduce incompatibility with > other legacy tools (e.g. Datatorrent's dtGateway) that assume user A's HDFS > path as the application path. Because the legacy tools will continue to > assume the old path (user A's path) they will not work with the Apex core > that has this change. > > The current behavior might also be preferable to certain users or their > administrators because of not having to deal with multiple HDFS user > directories (for administration, logging, backup etc). > > On Thu, May 18, 2017 at 4:01 PM, Sandesh Hegde <sand...@datatorrent.com> > wrote: > > > My vote is to make the new proposal as the default behavior. Is there a > use > > case for the current behavior? If not then no need to add the > configuration > > setting. > > > > On Thu, May 18, 2017 at 3:47 PM Pramod Immaneni <pra...@datatorrent.com> > > wrote: > > > > > Sorry typo in sentence "as we are not asking for permissions for a > lower > > > privilege", please read as "as we are now asking for permissions for a > > > lower privilege". > > > > > > On Thu, May 18, 2017 at 3:44 PM, Pramod Immaneni < > pra...@datatorrent.com > > > > > > wrote: > > > > > > > Apex cli supports impersonation in secure mode. With impersonation, > the > > > > user running the cli or the user authenticating with hadoop > (henceforth > > > > referred to as login user) can be different from the effective user > > with > > > > which the actions are performed under hadoop. An example for this is > an > > > > application can be launched by user A to run in hadoop as user B. > This > > is > > > > kind of like the sudo functionality in unix. You can find more > details > > > > about the functionalilty here > > > https://apex.apache.org/docs/apex/security/ in > > > > the Impersonation section. > > > > > > > > What happens today with launching an application with impersonation, > > > using > > > > the above launch example, is that even though the application runs as > > > user > > > > B it still uses user A's hdfs path for the application path. The > > > > application path is where the artifacts necessary to run the > > application > > > > are stored and where the runtime files like checkpoints are stored. > > This > > > > means that user B needs to have read and write access to user A's > > > > application path folders. > > > > > > > > This may not be allowed in certain environments as it may be a policy > > > > violation for the following reason. Because user A is able to > > impersonate > > > > as user B to launch the application, A is considered to be a higher > > > > privileged user than B and is given necessary privileges in hadoop to > > do > > > > so. But after launch B needs to access folders belonging to A which > > could > > > > constitute a violation as we are not asking for permissions for a > lower > > > > privilege user to access resources of a higher privilege user. > > > > > > > > I would like to propose adding a configuration setting, which when > set > > > > will use the application path in the impersonated user's home > directory > > > > (user B) as opposed to impersonating user's home directory (user A). > If > > > > this setting is not specified then the behavior can default to what > it > > is > > > > today for backwards compatibility. > > > > > > > > Comments, suggestions, concerns? > > > > > > > > Thanks > > > > > > > > > >