On Tue, Jul 17, 2018 at 2:47 PM, Todd Lipcon <[email protected]>
wrote:

> Hey folks,
>
> I'm working on a regression test for IMPALA-7311 and found something
> interesting. It appears that in our normal minicluster setup, impalad runs
> as the same username as the namenode (namely, the username of the
> developer, in my case 'todd').
>
> This means that the NN treats impala as a superuser, and therefore doesn't
> actually enforce permissions. So, tests about the behavior of Impala on
> files that it doesn't have access to are somewhat tricky to write.
>
>
What kind of files do you specifically mean? Something that the daemon
tries to access directly (Eg: keytab file, log files, etc.) ? I'm guessing
it's not this since you mentioned the NN.

Or files that belong to a table/partition in HDFS? If it's this case, we
would go through Sentry before accessing files that belong to a table, and
access would be determined by Sentry on the "session user" (not the impalad
user) before Impala even tries to access HDFS. (Eg:
tests/authorization/test_authorization.py)

If you're describing a different scenario than the above 2 I mentioned,
then I'd be interested to hear that, so it'll be easier to understand why
this change is necessary.

Has anyone run into this before? Should we consider running either the
> impalad or the namenode as a different spoofed username so that the
> minicluster environment is more authentic to true cluster environments? We
> can do this easily by setting the HADOOP_USER_NAME environment variable or
> system property.
>
> -Todd
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to