Re: Concerns about Hive authz2 support

Sergio Pena Mon, 02 Oct 2017 15:01:20 -0700

Sure.

First, here's what Hive Wiki says about authz1 limitations:

The default authorization in Hive
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
 is *not designed* with the intent to protect against malicious users
accessing data they should not be accessing. It only helps in preventing
users from accidentally doing operations they are not supposed to do. It is
also incomplete because it does not have authorization checks for many
operations including the grant statement. The authorization checks happen
during Hive query compilation. But as the user is allowed to
execute dfs commands, user-defined functions and shell commands, it is
possible to bypass the client security checks.

See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
ndard+Based+Hive+Authorization

The above problem is the reason Hive introduced a new authorization API
called authz2. However, I saw that some of those limitations are handled by
Sentry already, such as GRANT privilege checks (on the Sentry server side).
Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
execution instead of using the authz1 API that Hive provides.

Authz1 uses the following configurations:

   - *hive.security.authorization.ma
   <http://hive.security.authorization.ma>nager*=(implementation of
   HiveAuthorizerFactory)

   - *hive.security.authorization.enabled*=true

There is, though, a HiveAuthorizerFactory implementation on Sentry when
bumping the version to Hive 0.13, but it does not provide the Controller
nor Validator classes to handle authorization for v1. These classes were
introduced in Sentry later to support authz2.

Based on the above, I think that Sentry does not use authz1 completely as
it provides the hooks necessary to the semantic analyzer and task execution
to provide that support (correct me if I'm wrong).

Nevertheless, the authz2 provides other functionalities that it would be
good to support, such as DFS commands authorization and keep HMS client
filtering, GRANT/REVOKE executions and privileges checks in one class
(HiveAuthorizerFactory) instead of 3 that Sentry provides.

Btw, Sentry 1.8 did not provide authz2 with Hive 2.0 support, seems it was
Hive 1.1 as well (I don't see the 2.0 version on the pom.xml).

Another proposal is to keep authz1 as default for Sentry 2.0 like Sentry
1.8 provides, and deprecate it later in the Sentry 2.x line once authz2 is
stable and we bump to newer versions of Hive 2 with fixes on this.

The configuration difference is:

*Sentry authz1:*
HMS
  MetastoreAuthzBinding for HMS server authorization.

HS2
  HiveAuthzBindingSessionHook for configuring semantic/filter hooks.
  SentryMetaStoreFilterHook for hms client filtering.
  SentryHiveAuthorizationTaskFactoryImpl that creates the
SentryGrantRevokeTask.
  HiveAuthzBindingHook that checks privileges during the semantic analyzer.
  SentryGrantRevokeTask that executes adds/removes privileges on the Sentry
server.

*Sentry authz2:*
HMS
  MetastoreAuthzBinding for HMS server authorization.

HS2
  HiveAuthzBindingSessionHookV2 for configuring semantic/filter hooks.
  SentryHiveAuthorizer that calls a Controller or Validator depending on
the authorization request.
  SentryHiveAccessController to grant/revoke commands.
  SentryAuthorizationValidator for HMS client filtering and check
privileges.

- Sergio

On Mon, Oct 2, 2017 at 12:34 PM, Sergio Pena <sergio.p...@cloudera.com>
wrote:

> Sure.
>
> First, here's what Hive Wiki says about authz1 limitations:
>
> The default authorization in Hive
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-3DefaultHiveAuthorization(LegacyMode)>
>  is *not designed* with the intent to protect against malicious users
> accessing data they should not be accessing. It only helps in preventing
> users from accidentally doing operations they are not supposed to do. It is
> also incomplete because it does not have authorization checks for many
> operations including the grant statement. The authorization checks happen
> during Hive query compilation. But as the user is allowed to
> execute dfs commands, user-defined functions and shell commands, it is
> possible to bypass the client security checks.
>
> See https://cwiki.apache.org/confluence/display/Hive/SQL+Sta
> ndard+Based+Hive+Authorization
>
> The above problem is the reason Hive introduced a new authorization API
> called authz2. However, I saw that some of those limitations are handled by
> Sentry already, such as GRANT privilege checks (on the Sentry server side).
> Also, Sentry provides the SentryGrantRevokeTask to handle the GRANT/REVOKE
> execution instead of using the authz1 API that Hive provides.
>
> Authz1 uses the following configurations:
>
>
>
>
>
> On Mon, Oct 2, 2017 at 9:56 AM, Colm O hEigeartaigh <cohei...@apache.org>
> wrote:
>
>> Hi Sergio,
>>
>> Could you give some background as to what the differences are between
>> "authz1" and "authz2"? Sorry if this is an obvious question :-)
>>
>> For the 1.8.0 release, authz1 was supported with Hive 1 and authz2 with
>> Hive 2, so I assumed the separate bindings were related to the Hive
>> versions being supported. Obviously this is not the case if we are still
>> talking about supporting authz1 with Hive 2.0.
>>
>> Colm.
>>
>> On Fri, Sep 29, 2017 at 8:59 PM, Sergio Pena <sergio.p...@cloudera.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > We are running into some problems with the support of Hive Authz V2
>> > especially related to the workaround that parses Hive command strings in
>> > Sentry using regular expressions to get some info that Hive is not
>> sending
>> > through the authz2 api. Hive 2.0 made some changes on commands that
>> caused
>> > issues with Sentry. These are fixed but the concern of doing this SQL
>> > parsing exists. We asked the Hive community to give us extra SQL
>> > information, but we cannot implement them in Sentry until a Hive
>> release is
>> > done. There are some concerns about the quality of authz2 too, such as
>> > create/drop table and functions calling Sentry twice for authorization
>> and
>> > the lack of testing being done on it.
>> >
>> > The original idea for Sentry 2.0 (future release) was to drop authz1
>> > support and use authz2 as default but the work is getting delayed until
>> > Hive releases something. Now that we bumped the Hive version to 2.0, I
>> was
>> > wondering if we should continue with authz1 and keep authz2 as an
>> > experimental support until Hive releases something we can consume to fix
>> > our issues. Then we can deprecate authz1 in a future 2.x release and
>> remove
>> > it in a major version.
>> >
>> > I was thinking if we remove any hive-authz2 profile and just add the
>> > hive-authz2 classes to the current sentry-binding-hive module so that
>> users
>> > are allowed to switch either to v1 or v2 (for testing). Also for the
>> tests,
>> > find a way to run all sentry-tests-hive with v1 and v2 to validate the
>> > quality of it.
>> >
>> > What does the PMC community think? Is it a good or bad idea?
>> >
>> > - Sergio
>> >
>>
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>

Re: Concerns about Hive authz2 support

Reply via email to