On Tue, Jul 17, 2018 at 2:47 PM, Todd Lipcon <[email protected]> wrote:
> Hey folks, > > I'm working on a regression test for IMPALA-7311 and found something > interesting. It appears that in our normal minicluster setup, impalad runs > as the same username as the namenode (namely, the username of the > developer, in my case 'todd'). > > This means that the NN treats impala as a superuser, and therefore doesn't > actually enforce permissions. So, tests about the behavior of Impala on > files that it doesn't have access to are somewhat tricky to write. > > What kind of files do you specifically mean? Something that the daemon tries to access directly (Eg: keytab file, log files, etc.) ? I'm guessing it's not this since you mentioned the NN. Or files that belong to a table/partition in HDFS? If it's this case, we would go through Sentry before accessing files that belong to a table, and access would be determined by Sentry on the "session user" (not the impalad user) before Impala even tries to access HDFS. (Eg: tests/authorization/test_authorization.py) If you're describing a different scenario than the above 2 I mentioned, then I'd be interested to hear that, so it'll be easier to understand why this change is necessary. Has anyone run into this before? Should we consider running either the > impalad or the namenode as a different spoofed username so that the > minicluster environment is more authentic to true cluster environments? We > can do this easily by setting the HADOOP_USER_NAME environment variable or > system property. > > -Todd > > -- > Todd Lipcon > Software Engineer, Cloudera >
