Re: Concerns about Hive authz2 support

Colm O hEigeartaigh Tue, 03 Oct 2017 03:34:32 -0700

Thanks for that detailed answer, Sergio. Just to correct you on something
though:


> Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it
was Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).

The sentry-binding-hive-v2 pom references only Hive 2.0.0 dependencies,
e.g.:

https://github.com/apache/sentry/blob/32d85bf2cc265dc33596a53d49b558bf20131480/sentry-binding/sentry-binding-hive-v2/pom.xml#L79

I know it works because I tested it with Hive 2.0.0 (
http://coheigea.blogspot.ie/2017/09/securing-apache-hive-part-v.html)

> Now that we bumped the Hive version to 2.0, I was wondering if we should
continue with authz1 and keep authz2 as an
> experimental support until Hive releases something we can consume to fix
> our issues. Then we can deprecate authz1 in a future 2.x release and
remove
> it in a major version.

Yes I think this makes sense, given the concerns you have raised. Do you
have a timeline on when the Hive issues are likely to be fixed? Maybe it
could be done before Sentry 2.0.0 in which case we could drop authz1 anyway
for 2.0.0?

> I was thinking if we remove any hive-authz2 profile and just add the
> hive-authz2 classes to the current sentry-binding-hive module so that
users
> are allowed to switch either to v1 or v2 (for testing). Also for the
tests,
> find a way to run all sentry-tests-hive with v1 and v2 to validate the
> quality of it.

Removing the hive-authz2 profile though could make it more difficult to
remove the authz1 functionality in the future, as it will be less clear
where the demarcation is between the two. Definitely it makes sense to run
the tests with both versions if possible.

Colm.

On Mon, Oct 2, 2017 at 11:00 PM, Sergio Pena <sergio.p...@cloudera.com>
wrote:

> Sure.
>
> First, here's what Hive Wiki says about authz1 limitations:
>
> The default authorization in Hive
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>  is *not designed* with the intent to protect against malicious users
> accessing data they should not be accessing. It only helps in preventing
> users from accidentally doing operations they are not supposed to do. It is
> also incomplete because it does not have authorization checks for many
> operations including the grant statement. The authorization checks happen
> during Hive query compilation. But as the user is allowed to
> execute dfs commands, user-defined functions and shell commands, it is
> possible to bypass the client security checks.
>
> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> ndard+Based+Hive+Authorization
>
> The above problem is the reason Hive introduced a new authorization API
> called authz2. However, I saw that some of those limitations are handled by
> Sentry already, such as GRANT privilege checks (on the Sentry server side).
> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
> execution instead of using the authz1 API that Hive provides.
>
> Authz1 uses the following configurations:
>
>    - *hive.security.authorization.ma
>    <http://hive.security.authorization.ma>nager*=(implementation of
>    HiveAuthorizerFactory)
>
>
>    - *hive.security.authorization.enabled*=true
>
>
> There is, though, a HiveAuthorizerFactory implementation on Sentry when
> bumping the version to Hive 0.13, but it does not provide the Controller
> nor Validator classes to handle authorization for v1. These classes were
> introduced in Sentry later to support authz2.
>
> Based on the above, I think that Sentry does not use authz1 completely as
> it provides the hooks necessary to the semantic analyzer and task execution
> to provide that support (correct me if I'm wrong).
>
> Nevertheless, the authz2 provides other functionalities that it would be
> good to support, such as DFS commands authorization and keep HMS client
> filtering, GRANT/REVOKE executions and privileges checks in one class
> (HiveAuthorizerFactory) instead of 3 that Sentry provides.
>
> Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it was
> Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).
>
> Another proposal is to keep authz1 as default for Sentry 2.0 like Sentry
> 1.8 provides, and deprecate it later in the Sentry 2.x line once authz2 is
> stable and we bump to newer versions of Hive 2 with fixes on this.
>
> The configuration difference is:
>
> *Sentry authz1:*
> HMS
>   MetastoreAuthzBinding for HMS server authorization.
>
> HS2
>   HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
>   SentryMetaStoreFilterHook for hms client filtering.
>   SentryHiveAuthorizationTaskFactoryImpl that creates the
> SentryGrantRevokeTask.
>   HiveAuthzBindingHook that checks privileges during the semantic analyzer.
>   SentryGrantRevokeTask that executes adds/removes privileges on the
> Sentry server.
>
> *Sentry authz2:*
> HMS
>   MetastoreAuthzBinding for HMS server authorization.
>
> HS2
>   HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
>   SentryHiveAuthorizer that calls a Controller or Validator depending on
> the authorization request.
>   SentryHiveAccessController to grant/revoke commands.
>   SentryAuthorizationValidator for HMS client filtering and check
> privileges.
>
>
> - Sergio
>
> On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <sergio.p...@cloudera.com>
> wrote:
>
>> Sure.
>>
>> First, here's what Hive Wiki says about authz1 limitations:
>>
>> The default authorization in Hive
>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>>  is *not designed* with the intent to protect against malicious users
>> accessing data they should not be accessing. It only helps in preventing
>> users from accidentally doing operations they are not supposed to do. It is
>> also incomplete because it does not have authorization checks for many
>> operations including the grant statement. The authorization checks happen
>> during Hive query compilation. But as the user is allowed to
>> execute dfs commands, user-defined functions and shell commands, it is
>> possible to bypass the client security checks.
>>
>> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
>> ndard+Based+Hive+Authorization
>>
>> The above problem is the reason Hive introduced a new authorization API
>> called authz2. However, I saw that some of those limitations are handled by
>> Sentry already, such as GRANT privilege checks (on the Sentry server side).
>> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
>> execution instead of using the authz1 API that Hive provides.
>>
>> Authz1 uses the following configurations:
>>
>>
>>
>>
>>
>> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <cohei...@apache.org>
>> wrote:
>>
>>> Hi Sergio,
>>>
>>> Could you give some background as to what the differences are between
>>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>>
>>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
>>> Hive 2, so I assumed the separate bindings were related to the Hive
>>> versions being supported. Obviously this is not the case if we are still
>>> talking about supporting authz1 with Hive 2.0.
>>>
>>> Colm.
>>>
>>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <sergio.p...@cloudera.com>
>>> wrote:
>>>
>>> > Hi All,
>>> >
>>> > We are running into some problems with the support of Hive Authz V2
>>> > especially related to the workaround that parses Hive command strings
>>> in
>>> > Sentry using regular expressions to get some info that Hive is not
>>> sending
>>> > through the authz2 api. Hive 2.0 made some changes on commands that
>>> caused
>>> > issues with Sentry. These are fixed but the concern of doing this SQL
>>> > parsing exists. We asked the Hive community to give us extra SQL
>>> > information, but we cannot implement them in Sentry until a Hive
>>> release is
>>> > done. There are some concerns about the quality of authz2 too, such as
>>> > create/drop table and functions calling Sentry twice for authorization
>>> and
>>> > the lack of testing being done on it.
>>> >
>>> > The original idea for Sentry 2.0 (future release) was to drop authz1
>>> > support and use authz2 as default but the work is getting delayed until
>>> > Hive releases something. Now that we bumped the Hive version to 2.0, I
>>> was
>>> > wondering if we should continue with authz1 and keep authz2 as an
>>> > experimental support until Hive releases something we can consume to
>>> fix
>>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>>> remove
>>> > it in a major version.
>>> >
>>> > I was thinking if we remove any hive-authz2 profile and just add the
>>> > hive-authz2 classes to the current sentry-binding-hive module so that
>>> users
>>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>>> tests,
>>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>>> > quality of it.
>>> >
>>> > What does the PMC community think? Is it a good or bad idea?
>>> >
>>> > - Sergio
>>> >
>>>
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Re: Concerns about Hive authz2 support

Reply via email to