I was reviewing the hive-authz2 support, and so far the concerns are only
on the checkPrivileges() method that is called by Hive to request access to
a Hive object. This is similar to what HiveAuthzBindingHook does on the
hive-authz1 profile.

So, I think I can do a mix of work. Use the authz2 work for grant/revoke
tasks and filtering HMS metadata + continuing using the
HiveAuthzBindingHook (or Semantic hooks) class to request access to an
object. This way we can use what is already tested by Sentry for years.
Also, these semantic hooks are not part of authz1 in Hive so we are sure
they won't be deleted in the future.

As an extra info on the difference in Hive about the semantic hooks and the
authorization V2 is just the parameters that are passed. Hive calls both,
Semantic hooks and doAuthorizationV2, in the same scope (Driver.compile),
so there is no benefit for us to continue investing effort on fixing the
checkPrivileges and Hive in order to make it work (at least no in the
short-term).

Here's the patch 1 of 2 I'm working on:
https://issues.apache.org/jira/browse/SENTRY-1978

A 2nd patch will include another part of the hive-auth2 for filtering HMS
metadata.
Let me know if you guys have a concern about it

- Sergio



On Tue, Oct 3, 2017 at 4:40 PM, Sergio Pena <sergio.p...@cloudera.com>
wrote:

> Aaah, I missed the $hive-v2.version property. I see we've been using Hive
> 2.0 for authz2 for a while.
>
> The concern for authz2 is about parsing Hive SQL strings that could change
> in future Hive versions, and when attempting to remove that, then other
> issues (that Na Li explained) started happening. The current authz2 works
> fine with the SQL parsing workaround that looks weird to do in Sentry.
>
> I thought a little more about the profiles, and it's good to idea keep
> both profiles for a while to avoid more complications later. Still, we need
> to really test this authz2 very well before switching to it as the default.
>
> Regarding Hive fixes, they're currently targeted to Hive 3.0. We can ask a
> backport to Hive 2.x but I don't know when they're going to release a new
> Hive 2.x version. This could delay our Sentry 2.0 + HA release if we want
> to wait for those releases.
>
> What should we do? Should we wait for a Hive release and verify authz2
> works correctly? or should we keep
> authz1 as the default and release something soon?
>
> - Sergio
>
> On Tue, Oct 3, 2017 at 11:33 AM, Na Li <lina...@cloudera.com> wrote:
>
>> Colm and Sergio,
>>
>> https://issues.apache.org/jira/browse/SENTRY-1957 tracks issues we found
>> when integrating hive with sentry for auth-2.
>>
>> The issues we found for auth-2 when integrating with Hive are:
>>
>> 1. When dropping non-exist database, hive does not pass name of the
>> non-exist database to DefaultSentryValidator.checkPrivileges(). It seems
>> it is hard for hive to fix it.
>> 2. When dropping non-exist table, hive does not pass name of the
>> non-exist table to DefaultSentryValidator.checkPrivileges(). It seems it
>> is hard for hive to fix it.
>> 3. When creating or dropping function, hive does not pass class name of
>> the function. HIVE-17544
>> <https://issues.apache.org/jira/browse/HIVE-17544> is fixing this issue.
>>
>> Sentry should remove parsing hive command, and it requires hive update.
>> We are still working on how to fix those issues.
>>
>> Thanks,
>>
>> Lina
>>
>> On Tue, Oct 3, 2017 at 5:34 AM, Colm O hEigeartaigh <cohei...@apache.org>
>> wrote:
>>
>>> Thanks for that detailed answer, Sergio. Just to correct you on something
>>> though:
>>>
>>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
>>> was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>>>
>>> The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
>>> e.g.:
>>>
>>> https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a5
>>> 3d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79
>>>
>>> I know it works because I tested it with Hive 2.0.0 (
>>> http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)
>>>
>>> > Now that we bumped the Hive version to 2.0, I was wondering if we
>>> should
>>> continue with authz1 and keep authz2 as an
>>> > experimental support until Hive releases something we can consume to
>>> fix
>>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>>> remove
>>> > it in a major version.
>>>
>>> Yes I think this makes sense, given the concerns you have raised. Do you
>>> have a timeline on when the Hive issues are likely to be fixed? Maybe it
>>> could be done before Sentry 2.0.0 in which case we could drop authz1
>>> anyway
>>> for 2.0.0?
>>>
>>> > I was thinking if we remove any hive-authz2 profile and just add the
>>> > hive-authz2 classes to the current sentry-binding-hive module so that
>>> users
>>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>>> tests,
>>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>>> > quality of it.
>>>
>>> Removing the hive-authz2 profile though could make it more difficult to
>>> remove the authz1 functionality in the future, as it will be less clear
>>> where the demarcation is between the two. Definitely it makes sense to
>>> run
>>> the tests with both versions if possible.
>>>
>>> Colm.
>>>
>>> On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <sergio.p...@cloudera.com>
>>> wrote:
>>>
>>> > Sure.
>>> >
>>> > First, here's what Hive Wiki says about authz1 limitations:
>>> >
>>> > The default authorization in Hive
>>> > <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
>>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
>>> uthorization(LegacyMode)>
>>> >  is *not designed* with the intent to protect against malicious users
>>> > accessing data they should not be accessing. It only helps in
>>> preventing
>>> > users from accidentally doing operations they are not supposed to do.
>>> It is
>>> > also incomplete because it does not have authorization checks for many
>>> > operations including the grant statement. The authorization checks
>>> happen
>>> > during Hive query compilation. But as the user is allowed to
>>> > execute dfs commands, user-defined functions and shell commands, it is
>>> > possible to bypass the client security checks.
>>> >
>>> > See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>>> > ndard+Based+Hive+Authorization
>>> >
>>> > The above problem is the reason Hive introduced a new authorization API
>>> > called authz2. However, I saw that some of those limitations are
>>> handled by
>>> > Sentry already, such as GRANT privilege checks (on the Sentry server
>>> side).
>>> > Also, Sentry provides the SentryGrantRevokeTask to handle the
>>> GRANT/REVOKE
>>> > execution instead of using the authz1 API that Hive provides.
>>> >
>>> > Authz1 uses the following configurations:
>>> >
>>> >    - *hive.security.authorization.ma
>>> >    <http://hive.security.authorization.ma>nager*=(implementation of
>>> >    HiveAuthorizerFactory)
>>> >
>>> >
>>> >    - *hive.security.authorization.enabled*=true
>>> >
>>> >
>>> > There is, though, a HiveAuthorizerFactory implementation on Sentry when
>>> > bumping the version to Hive 0.13, but it does not provide the
>>> Controller
>>> > nor Validator classes to handle authorization for v1. These classes
>>> were
>>> > introduced in Sentry later to support authz2.
>>> >
>>> > Based on the above, I think that Sentry does not use authz1 completely
>>> as
>>> > it provides the hooks necessary to the semantic analyzer and task
>>> execution
>>> > to provide that support (correct me if I'm wrong).
>>> >
>>> > Nevertheless, the authz2 provides other functionalities that it would
>>> be
>>> > good to support, such as DFS commands authorization and keep HMS client
>>> > filtering, GRANT/REVOKE executions and privileges checks in one class
>>> > (HiveAuthorizerFactory) instead of 3 that Sentry provides.
>>> >
>>> > Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
>>> was
>>> > Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>>> >
>>> > Another proposal is to keep authz1 as default for Sentry 2.0 like
>>> Sentry
>>> > 1.8 provides, and deprecate it later in the Sentry 2.x line once
>>> authz2 is
>>> > stable and we bump to newer versions of Hive 2 with fixes on this.
>>> >
>>> > The configuration difference is:
>>> >
>>> > *Sentry authz1:*
>>> > HMS
>>> >   MetastoreAuthzBinding for HMS server authorization.
>>> >
>>> > HS2
>>> >   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
>>> >   SentryMetaStoreFilterHook for hms client filtering.
>>> >   SentryHiveAuthorizationTaskFactoryImpl that creates the
>>> > SentryGrantRevokeTask.
>>> >   HiveAuthzBindingHook that checks privileges during the semantic
>>> analyzer.
>>> >   SentryGrantRevokeTask that executes adds/removes privileges on the
>>> > Sentry server.
>>> >
>>> > *Sentry authz2:*
>>> > HMS
>>> >   MetastoreAuthzBinding for HMS server authorization.
>>> >
>>> > HS2
>>> >   HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
>>> >   SentryHiveAuthorizer that calls a Controller or Validator depending
>>> on
>>> > the authorization request.
>>> >   SentryHiveAccessController to grant/revoke commands.
>>> >   SentryAuthorizationValidator for HMS client filtering and check
>>> > privileges.
>>> >
>>> >
>>> > - Sergio
>>> >
>>> > On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <sergio.p...@cloudera.com
>>> >
>>> > wrote:
>>> >
>>> >> Sure.
>>> >>
>>> >> First, here's what Hive Wiki says about authz1 limitations:
>>> >>
>>> >> The default authorization in Hive
>>> >> <https://cwiki.apache.org/confluence/display/Hive/LanguageMa
>>> nual+Authorization#LanguageManualAuthorization-3DefaultHiveA
>>> uthorization(LegacyMode)>
>>> >>  is *not designed* with the intent to protect against malicious users
>>> >> accessing data they should not be accessing. It only helps in
>>> preventing
>>> >> users from accidentally doing operations they are not supposed to do.
>>> It is
>>> >> also incomplete because it does not have authorization checks for many
>>> >> operations including the grant statement. The authorization checks
>>> happen
>>> >> during Hive query compilation. But as the user is allowed to
>>> >> execute dfs commands, user-defined functions and shell commands, it is
>>> >> possible to bypass the client security checks.
>>> >>
>>> >> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>>> >> ndard+Based+Hive+Authorization
>>> >>
>>> >> The above problem is the reason Hive introduced a new authorization
>>> API
>>> >> called authz2. However, I saw that some of those limitations are
>>> handled by
>>> >> Sentry already, such as GRANT privilege checks (on the Sentry server
>>> side).
>>> >> Also, Sentry provides the SentryGrantRevokeTask to handle the
>>> GRANT/REVOKE
>>> >> execution instead of using the authz1 API that Hive provides.
>>> >>
>>> >> Authz1 uses the following configurations:
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <
>>> cohei...@apache.org>
>>> >> wrote:
>>> >>
>>> >>> Hi Sergio,
>>> >>>
>>> >>> Could you give some background as to what the differences are between
>>> >>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>> >>>
>>> >>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2
>>> with
>>> >>> Hive 2, so I assumed the separate bindings were related to the Hive
>>> >>> versions being supported. Obviously this is not the case if we are
>>> still
>>> >>> talking about supporting authz1 with Hive 2.0.
>>> >>>
>>> >>> Colm.
>>> >>>
>>> >>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <
>>> sergio.p...@cloudera.com>
>>> >>> wrote:
>>> >>>
>>> >>> > Hi All,
>>> >>> >
>>> >>> > We are running into some problems with the support of Hive Authz V2
>>> >>> > especially related to the workaround that parses Hive command
>>> strings
>>> >>> in
>>> >>> > Sentry using regular expressions to get some info that Hive is not
>>> >>> sending
>>> >>> > through the authz2 api. Hive 2.0 made some changes on commands that
>>> >>> caused
>>> >>> > issues with Sentry. These are fixed but the concern of doing this
>>> SQL
>>> >>> > parsing exists. We asked the Hive community to give us extra SQL
>>> >>> > information, but we cannot implement them in Sentry until a Hive
>>> >>> release is
>>> >>> > done. There are some concerns about the quality of authz2 too,
>>> such as
>>> >>> > create/drop table and functions calling Sentry twice for
>>> authorization
>>> >>> and
>>> >>> > the lack of testing being done on it.
>>> >>> >
>>> >>> > The original idea for Sentry 2.0 (future release) was to drop
>>> authz1
>>> >>> > support and use authz2 as default but the work is getting delayed
>>> until
>>> >>> > Hive releases something. Now that we bumped the Hive version to
>>> 2.0, I
>>> >>> was
>>> >>> > wondering if we should continue with authz1 and keep authz2 as an
>>> >>> > experimental support until Hive releases something we can consume
>>> to
>>> >>> fix
>>> >>> > our issues. Then we can deprecate authz1 in a future 2.x release
>>> and
>>> >>> remove
>>> >>> > it in a major version.
>>> >>> >
>>> >>> > I was thinking if we remove any hive-authz2 profile and just add
>>> the
>>> >>> > hive-authz2 classes to the current sentry-binding-hive module so
>>> that
>>> >>> users
>>> >>> > are allowed to switch either to v1 or v2 (for testing). Also for
>>> the
>>> >>> tests,
>>> >>> > find a way to run all sentry-tests-hive with v1 and v2 to validate
>>> the
>>> >>> > quality of it.
>>> >>> >
>>> >>> > What does the PMC community think? Is it a good or bad idea?
>>> >>> >
>>> >>> > - Sergio
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Colm O hEigeartaigh
>>> >>>
>>> >>> Talend Community Coder
>>> >>> http://coders.talend.com
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>

Reply via email to