Sergio, Regarding input, what I found is below
context of input at semantic hook postAnalyze() for command ("SELECT COUNT(A) FROM VIEW_1"). inputs[0] db_1@view_1 accessedColumns size = 1 value is "a" inputs[1] db_1@tab_1 accessedColumns size = 0 input of checkPrivileges inputHObjs size = 1 input of checkPrivileges() inputHObjs size = 1 inputHobjs[0].dbname = "db_1" inputHobjs[0].objectName = "view_1" inputHobjs[0].colums size = 1 inputHobjs[0].colums[0] = "a" So the input at DefaultSentryValidator.checkPrivileges() is the most desirable. Hive passes in the exact info to sentry for authorization. If user has privilege for accessing the view, but does not have privilege for accessing the table, the authorization will succeed. In semantic hook, Hive passes to Sentry correct info for view, but it also passes extra info for table, which is not used in the command. If user has privilege for accessing the view, but does not have privilege for accessing the table, the authorization will fail. Thanks, Lina On Tue, Oct 10, 2017 at 6:44 PM, Sergio Pena <sergio.p...@cloudera.com> wrote: > I was reviewing the hive-authz2 support, and so far the concerns are only > on the checkPrivileges() method that is called by Hive to request access to > a Hive object. This is similar to what HiveAuthzBindingHook does on the > hive-authz1 profile. > > So, I think I can do a mix of work. Use the authz2 work for grant/revoke > tasks and filtering HMS metadata + continuing using the > HiveAuthzBindingHook (or Semantic hooks) class to request access to an > object. This way we can use what is already tested by Sentry for years. > Also, these semantic hooks are not part of authz1 in Hive so we are sure > they won't be deleted in the future. > > As an extra info on the difference in Hive about the semantic hooks and the > authorization V2 is just the parameters that are passed. Hive calls both, > Semantic hooks and doAuthorizationV2, in the same scope (Driver.compile), > so there is no benefit for us to continue investing effort on fixing the > checkPrivileges and Hive in order to make it work (at least no in the > short-term). > > Here's the patch 1 of 2 I'm working on: > https://issues.apache.org/jira/browse/SENTRY-1978 > > A 2nd patch will include another part of the hive-auth2 for filtering HMS > metadata. > Let me know if you guys have a concern about it > > - Sergio > > > > On Tue, Oct 3, 2017 at 4:40 PM, Sergio Pena <sergio.p...@cloudera.com> > wrote: > > > Aaah, I missed the $hive-v2.version property. I see we've been using Hive > > 2.0 for authz2 for a while. > > > > The concern for authz2 is about parsing Hive SQL strings that could > change > > in future Hive versions, and when attempting to remove that, then other > > issues (that Na Li explained) started happening. The current authz2 works > > fine with the SQL parsing workaround that looks weird to do in Sentry. > > > > I thought a little more about the profiles, and it's good to idea keep > > both profiles for a while to avoid more complications later. Still, we > need > > to really test this authz2 very well before switching to it as the > default. > > > > Regarding Hive fixes, they're currently targeted to Hive 3.0. We can ask > a > > backport to Hive 2.x but I don't know when they're going to release a new > > Hive 2.x version. This could delay our Sentry 2.0 + HA release if we want > > to wait for those releases. > > > > What should we do? Should we wait for a Hive release and verify authz2 > > works correctly? or should we keep > > authz1 as the default and release something soon? > > > > - Sergio > > > > On Tue, Oct 3, 2017 at 11:33 AM, Na Li <lina...@cloudera.com> wrote: > > > >> Colm and Sergio, > >> > >> https://issues.apache.org/jira/browse/SENTRY-1957 tracks issues we > found > >> when integrating hive with sentry for auth-2. > >> > >> The issues we found for auth-2 when integrating with Hive are: > >> > >> 1. When dropping non-exist database, hive does not pass name of the > >> non-exist database to DefaultSentryValidator.checkPrivileges(). It > seems > >> it is hard for hive to fix it. > >> 2. When dropping non-exist table, hive does not pass name of the > >> non-exist table to DefaultSentryValidator.checkPrivileges(). It seems > it > >> is hard for hive to fix it. > >> 3. When creating or dropping function, hive does not pass class name of > >> the function. HIVE-17544 > >> <https://issues.apache.org/jira/browse/HIVE-17544> is fixing this > issue. > >> > >> Sentry should remove parsing hive command, and it requires hive update. > >> We are still working on how to fix those issues. > >> > >> Thanks, > >> > >> Lina > >> > >> On Tue, Oct 3, 2017 at 5:34 AM, Colm O hEigeartaigh < > cohei...@apache.org> > >> wrote: > >> > >>> Thanks for that detailed answer, Sergio. Just to correct you on > something > >>> though: > >>> > >>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems > it > >>> was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml). > >>> > >>> The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies, > >>> e.g.: > >>> > >>> https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a5 > >>> 3d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79 > >>> > >>> I know it works because I tested it with Hive 2.0.0 ( > >>> http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html) > >>> > >>> > Now that we bumped the Hive version to 2.0, I was wondering if we > >>> should > >>> continue with authz1 and keep authz2 as an > >>> > experimental support until Hive releases something we can consume to > >>> fix > >>> > our issues. Then we can deprecate authz1 in a future 2.x release and > >>> remove > >>> > it in a major version. > >>> > >>> Yes I think this makes sense, given the concerns you have raised. Do > you > >>> have a timeline on when the Hive issues are likely to be fixed? Maybe > it > >>> could be done before Sentry 2.0.0 in which case we could drop authz1 > >>> anyway > >>> for 2.0.0? > >>> > >>> > I was thinking if we remove any hive-authz2 profile and just add the > >>> > hive-authz2 classes to the current sentry-binding-hive module so that > >>> users > >>> > are allowed to switch either to v1 or v2 (for testing). Also for the > >>> tests, > >>> > find a way to run all sentry-tests-hive with v1 and v2 to validate > the > >>> > quality of it. > >>> > >>> Removing the hive-authz2 profile though could make it more difficult to > >>> remove the authz1 functionality in the future, as it will be less clear > >>> where the demarcation is between the two. Definitely it makes sense to > >>> run > >>> the tests with both versions if possible. > >>> > >>> Colm. > >>> > >>> On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <sergio.p...@cloudera.com > > > >>> wrote: > >>> > >>> > Sure. > >>> > > >>> > First, here's what Hive Wiki says about authz1 limitations: > >>> > > >>> > The default authorization in Hive > >>> > <https://cwiki.apache.org/confluence/display/Hive/LanguageMa > >>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA > >>> uthorization(LegacyMode)> > >>> > is *not designed* with the intent to protect against malicious users > >>> > accessing data they should not be accessing. It only helps in > >>> preventing > >>> > users from accidentally doing operations they are not supposed to do. > >>> It is > >>> > also incomplete because it does not have authorization checks for > many > >>> > operations including the grant statement. The authorization checks > >>> happen > >>> > during Hive query compilation. But as the user is allowed to > >>> > execute dfs commands, user-defined functions and shell commands, it > is > >>> > possible to bypass the client security checks. > >>> > > >>> > See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta > >>> > ndard+Based+Hive+Authorization > >>> > > >>> > The above problem is the reason Hive introduced a new authorization > API > >>> > called authz2. However, I saw that some of those limitations are > >>> handled by > >>> > Sentry already, such as GRANT privilege checks (on the Sentry server > >>> side). > >>> > Also, Sentry provides the SentryGrantRevokeTask to handle the > >>> GRANT/REVOKE > >>> > execution instead of using the authz1 API that Hive provides. > >>> > > >>> > Authz1 uses the following configurations: > >>> > > >>> > - *hive.security.authorization.ma > >>> > <http://hive.security.authorization.ma>nager*=(implementation of > >>> > HiveAuthorizerFactory) > >>> > > >>> > > >>> > - *hive.security.authorization.enabled*=true > >>> > > >>> > > >>> > There is, though, a HiveAuthorizerFactory implementation on Sentry > when > >>> > bumping the version to Hive 0.13, but it does not provide the > >>> Controller > >>> > nor Validator classes to handle authorization for v1. These classes > >>> were > >>> > introduced in Sentry later to support authz2. > >>> > > >>> > Based on the above, I think that Sentry does not use authz1 > completely > >>> as > >>> > it provides the hooks necessary to the semantic analyzer and task > >>> execution > >>> > to provide that support (correct me if I'm wrong). > >>> > > >>> > Nevertheless, the authz2 provides other functionalities that it would > >>> be > >>> > good to support, such as DFS commands authorization and keep HMS > client > >>> > filtering, GRANT/REVOKE executions and privileges checks in one class > >>> > (HiveAuthorizerFactory) instead of 3 that Sentry provides. > >>> > > >>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems > it > >>> was > >>> > Hive 1.1 as well (I don't see the 2.0 version on the pom.xml). > >>> > > >>> > Another proposal is to keep authz1 as default for Sentry 2.0 like > >>> Sentry > >>> > 1.8 provides, and deprecate it later in the Sentry 2.x line once > >>> authz2 is > >>> > stable and we bump to newer versions of Hive 2 with fixes on this. > >>> > > >>> > The configuration difference is: > >>> > > >>> > *Sentry authz1:* > >>> > HMS > >>> > MetastoreAuthzBinding for HMS server authorization. > >>> > > >>> > HS2 > >>> > HiveAuthzBindingSessionHook for configuring semantic/filter hooks. > >>> > SentryMetaStoreFilterHook for hms client filtering. > >>> > SentryHiveAuthorizationTaskFactoryImpl that creates the > >>> > SentryGrantRevokeTask. > >>> > HiveAuthzBindingHook that checks privileges during the semantic > >>> analyzer. > >>> > SentryGrantRevokeTask that executes adds/removes privileges on the > >>> > Sentry server. > >>> > > >>> > *Sentry authz2:* > >>> > HMS > >>> > MetastoreAuthzBinding for HMS server authorization. > >>> > > >>> > HS2 > >>> > HiveAuthzBindingSessionHookV2 for configuring semantic/filter > hooks. > >>> > SentryHiveAuthorizer that calls a Controller or Validator depending > >>> on > >>> > the authorization request. > >>> > SentryHiveAccessController to grant/revoke commands. > >>> > SentryAuthorizationValidator for HMS client filtering and check > >>> > privileges. > >>> > > >>> > > >>> > - Sergio > >>> > > >>> > On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena < > sergio.p...@cloudera.com > >>> > > >>> > wrote: > >>> > > >>> >> Sure. > >>> >> > >>> >> First, here's what Hive Wiki says about authz1 limitations: > >>> >> > >>> >> The default authorization in Hive > >>> >> <https://cwiki.apache.org/confluence/display/Hive/LanguageMa > >>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA > >>> uthorization(LegacyMode)> > >>> >> is *not designed* with the intent to protect against malicious > users > >>> >> accessing data they should not be accessing. It only helps in > >>> preventing > >>> >> users from accidentally doing operations they are not supposed to > do. > >>> It is > >>> >> also incomplete because it does not have authorization checks for > many > >>> >> operations including the grant statement. The authorization checks > >>> happen > >>> >> during Hive query compilation. But as the user is allowed to > >>> >> execute dfs commands, user-defined functions and shell commands, it > is > >>> >> possible to bypass the client security checks. > >>> >> > >>> >> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta > >>> >> ndard+Based+Hive+Authorization > >>> >> > >>> >> The above problem is the reason Hive introduced a new authorization > >>> API > >>> >> called authz2. However, I saw that some of those limitations are > >>> handled by > >>> >> Sentry already, such as GRANT privilege checks (on the Sentry server > >>> side). > >>> >> Also, Sentry provides the SentryGrantRevokeTask to handle the > >>> GRANT/REVOKE > >>> >> execution instead of using the authz1 API that Hive provides. > >>> >> > >>> >> Authz1 uses the following configurations: > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh < > >>> cohei...@apache.org> > >>> >> wrote: > >>> >> > >>> >>> Hi Sergio, > >>> >>> > >>> >>> Could you give some background as to what the differences are > between > >>> >>> "authz1" and "authz2"? Sorry if this is an obvious question :-) > >>> >>> > >>> >>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 > >>> with > >>> >>> Hive 2, so I assumed the separate bindings were related to the Hive > >>> >>> versions being supported. Obviously this is not the case if we are > >>> still > >>> >>> talking about supporting authz1 with Hive 2.0. > >>> >>> > >>> >>> Colm. > >>> >>> > >>> >>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena < > >>> sergio.p...@cloudera.com> > >>> >>> wrote: > >>> >>> > >>> >>> > Hi All, > >>> >>> > > >>> >>> > We are running into some problems with the support of Hive Authz > V2 > >>> >>> > especially related to the workaround that parses Hive command > >>> strings > >>> >>> in > >>> >>> > Sentry using regular expressions to get some info that Hive is > not > >>> >>> sending > >>> >>> > through the authz2 api. Hive 2.0 made some changes on commands > that > >>> >>> caused > >>> >>> > issues with Sentry. These are fixed but the concern of doing this > >>> SQL > >>> >>> > parsing exists. We asked the Hive community to give us extra SQL > >>> >>> > information, but we cannot implement them in Sentry until a Hive > >>> >>> release is > >>> >>> > done. There are some concerns about the quality of authz2 too, > >>> such as > >>> >>> > create/drop table and functions calling Sentry twice for > >>> authorization > >>> >>> and > >>> >>> > the lack of testing being done on it. > >>> >>> > > >>> >>> > The original idea for Sentry 2.0 (future release) was to drop > >>> authz1 > >>> >>> > support and use authz2 as default but the work is getting delayed > >>> until > >>> >>> > Hive releases something. Now that we bumped the Hive version to > >>> 2.0, I > >>> >>> was > >>> >>> > wondering if we should continue with authz1 and keep authz2 as an > >>> >>> > experimental support until Hive releases something we can consume > >>> to > >>> >>> fix > >>> >>> > our issues. Then we can deprecate authz1 in a future 2.x release > >>> and > >>> >>> remove > >>> >>> > it in a major version. > >>> >>> > > >>> >>> > I was thinking if we remove any hive-authz2 profile and just add > >>> the > >>> >>> > hive-authz2 classes to the current sentry-binding-hive module so > >>> that > >>> >>> users > >>> >>> > are allowed to switch either to v1 or v2 (for testing). Also for > >>> the > >>> >>> tests, > >>> >>> > find a way to run all sentry-tests-hive with v1 and v2 to > validate > >>> the > >>> >>> > quality of it. > >>> >>> > > >>> >>> > What does the PMC community think? Is it a good or bad idea? > >>> >>> > > >>> >>> > - Sergio > >>> >>> > > >>> >>> > >>> >>> > >>> >>> > >>> >>> -- > >>> >>> Colm O hEigeartaigh > >>> >>> > >>> >>> Talend Community Coder > >>> >>> http://coders.talend.com > >>> >>> > >>> >> > >>> >> > >>> > > >>> > >>> > >>> -- > >>> Colm O hEigeartaigh > >>> > >>> Talend Community Coder > >>> http://coders.talend.com > >>> > >> > >> > > >