On Wed, Jul 18, 2018 at 10:21 AM, Todd Lipcon <[email protected]>
wrote:

> On Tue, Jul 17, 2018 at 5:27 PM, Sailesh Mukil
> <[email protected]
> > wrote:
>
> > On Tue, Jul 17, 2018 at 2:47 PM, Todd Lipcon <[email protected]>
> > wrote:
> >
> > > Hey folks,
> > >
> > > I'm working on a regression test for IMPALA-7311 and found something
> > > interesting. It appears that in our normal minicluster setup, impalad
> > runs
> > > as the same username as the namenode (namely, the username of the
> > > developer, in my case 'todd').
> > >
> > > This means that the NN treats impala as a superuser, and therefore
> > doesn't
> > > actually enforce permissions. So, tests about the behavior of Impala on
> > > files that it doesn't have access to are somewhat tricky to write.
> > >
> > >
> > What kind of files do you specifically mean? Something that the daemon
> > tries to access directly (Eg: keytab file, log files, etc.) ? I'm
> guessing
> > it's not this since you mentioned the NN.
> >
> > Or files that belong to a table/partition in HDFS? If it's this case, we
> > would go through Sentry before accessing files that belong to a table,
> and
> > access would be determined by Sentry on the "session user" (not the
> impalad
> > user) before Impala even tries to access HDFS. (Eg:
> > tests/authorization/test_authorization.py)
> >
>
> Right, files on HDFS. I mean that, in cases where Sentry is not enabled or
> set up, and even in some cases where it is set up but not synchronized with
> HDFS, it's possible that the user can point table metadata at files or
> directories that aren't writable to the 'impala' user on HDFS. For example,
> I can do:
>
> CREATE EXTERNAL TABLE foo (...) LOCATION '/user/todd/my-dir';
>
> and it's likely that 'my-dir' is not writable by 'impala' on a real
> cluster. Thus, if I try to insert into it, I get an error because "impala"
> does not have HDFS permissions to access this directory.
>
> Currently, the frontend does some checks here to try to produce a nice
> error. But, those checks are based on cached metadata which could be in
> accurate. In the case that it's inaccurate, the error will be thrown from
> the backend when it tries to create a file in a non-writable location.
>
> In the minicluster environment, it's impossible to test this case (actual
> permissions enforced by the NN causing an error) because the backend is
> running as an HDFS superuser. That is to say, it has full permissions
> everywhere. That's due to the special case behavior that HDFS has: it
> determines the name of the superuser to be the username that is running the
> NN. Since in the minicluster, both impala and the NN run as 'todd' in my
> case, impala acts as superuser. In a real cluster (even with security
> disabled) impala typically runs as 'impala' whereas the NN runs as 'hdfs'
> and thus impala does not have superuser privileges.
>

This makes sense, thanks for the explanation. The 'HADOOP_USER_NAME'
approach seems like a good way to go, but as Phil said, might cause issues
with other components (or not).


> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to