Hi Todd, You're right: the current minicluster setup makes it impossible to test some cases.
I think your approach of telling Hadoop a different user is sensible, though might be a rabbit hole in terms of getting Hive and HBase and everything happy. I know we are able to run some tests against clusters deployed on VM. There is some support for this already. I'm not sure we have an extant annotation for it, but you could write tests targeting that setup. Thanks, -- Philip On Wed, Jul 18, 2018 at 10:22 AM Todd Lipcon <[email protected]> wrote: > On Tue, Jul 17, 2018 at 5:27 PM, Sailesh Mukil > <[email protected] > > wrote: > > > On Tue, Jul 17, 2018 at 2:47 PM, Todd Lipcon <[email protected]> > > wrote: > > > > > Hey folks, > > > > > > I'm working on a regression test for IMPALA-7311 and found something > > > interesting. It appears that in our normal minicluster setup, impalad > > runs > > > as the same username as the namenode (namely, the username of the > > > developer, in my case 'todd'). > > > > > > This means that the NN treats impala as a superuser, and therefore > > doesn't > > > actually enforce permissions. So, tests about the behavior of Impala on > > > files that it doesn't have access to are somewhat tricky to write. > > > > > > > > What kind of files do you specifically mean? Something that the daemon > > tries to access directly (Eg: keytab file, log files, etc.) ? I'm > guessing > > it's not this since you mentioned the NN. > > > > Or files that belong to a table/partition in HDFS? If it's this case, we > > would go through Sentry before accessing files that belong to a table, > and > > access would be determined by Sentry on the "session user" (not the > impalad > > user) before Impala even tries to access HDFS. (Eg: > > tests/authorization/test_authorization.py) > > > > Right, files on HDFS. I mean that, in cases where Sentry is not enabled or > set up, and even in some cases where it is set up but not synchronized with > HDFS, it's possible that the user can point table metadata at files or > directories that aren't writable to the 'impala' user on HDFS. For example, > I can do: > > CREATE EXTERNAL TABLE foo (...) LOCATION '/user/todd/my-dir'; > > and it's likely that 'my-dir' is not writable by 'impala' on a real > cluster. Thus, if I try to insert into it, I get an error because "impala" > does not have HDFS permissions to access this directory. > > Currently, the frontend does some checks here to try to produce a nice > error. But, those checks are based on cached metadata which could be in > accurate. In the case that it's inaccurate, the error will be thrown from > the backend when it tries to create a file in a non-writable location. > > In the minicluster environment, it's impossible to test this case (actual > permissions enforced by the NN causing an error) because the backend is > running as an HDFS superuser. That is to say, it has full permissions > everywhere. That's due to the special case behavior that HDFS has: it > determines the name of the superuser to be the username that is running the > NN. Since in the minicluster, both impala and the NN run as 'todd' in my > case, impala acts as superuser. In a real cluster (even with security > disabled) impala typically runs as 'impala' whereas the NN runs as 'hdfs' > and thus impala does not have superuser privileges. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera >
