[ 
https://issues.apache.org/jira/browse/IMPALA-10192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-10192:
------------------------------------
    Description: 
Users reported an IllegalStateException about column masking. I can reproduce 
it in the master branch:
{code:java}
I0925 21:42:09.684499 20809 jni-util.cc:288] ed44b3c5ca4a0e7d:8c4e884400000000] 
java.lang.IllegalStateException
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:492)
        at 
org.apache.impala.authorization.ranger.RangerAuthorizationContext.stashAuditEvents(RangerAuthorizationContext.java:71)
        at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.postAnalyze(RangerAuthorizationChecker.java:373)
        at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:440)
        at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1562)
        at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1529)
        at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1499)
        at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:162)
{code}
*Reproducing*
 Start Impala cluster with Ranger authz enabled
{code:java}
bin/start-impala-cluster.py --impalad_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger" --catalogd_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger"
{code}
Create a tmp table using your username.
{code:java}
$ bin/impala-shell.sh
[localhost:21050] default> create table tmp_tbl (id int, name string) stored as 
parquet;
{code}
Open the Ranger WebUI at [http://localhost:6080/]. Add two column masking 
policies:
 * Masking default.tmp_tbl.id using HASH for user "non_owner"
 * Masking default.tmp_tbl.name using REDACT for your username (quanlong in my 
case)

Refresh the policies in impala and query the table using your username.
{code:java}
bin/impala-shell.sh -u admin -q "refresh authorization"
bin/impala-shell.sh -q "select * from tmp_tbl"
{code}
The last query will fail with "ERROR: IllegalStateException: null".

The policy file is attached.

*Clues*

In RangerAuthorizationContext.stashAuditEvents(), we deduplicate the column 
masking audit events. There is a Precondition check that all events generated 
are column masking events:
 
[https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java#L71]
 Codes:
{code:java}
  public void stashAuditEvents(RangerImpalaPlugin plugin) {
    Set<String> unfilteredMaskNames = plugin.getUnfilteredMaskNames(
        Arrays.asList("MASK_NONE"));
    for (AuthzAuditEvent event : auditHandler_.getAuthzEvents()) {
      // We assume that all the logged events until now are column 
masking-related. Since
      // we remove those AuthzAuditEvent's corresponding to the "Unmasked" 
policy of type
      // "MASK_NONE", we exclude this type of mask.
      Preconditions.checkState(unfilteredMaskNames
          .contains(event.getAccessType().toUpperCase()));

      // event.getEventKey() is the concatenation of the following fields in an
      // AuthzAuditEvent: 'user', 'accessType', 'resourcePath', 'resourceType', 
'action',
      // 'accessResult', 'sessionId', and 'clientIP'. Recall that 
'resourcePath' is the
      // concatenation of 'dbName', 'tableName', and 'columnName' that were 
used to
      // instantiate a RangerAccessResourceImpl in order to create a 
RangerAccessRequest
      // to call RangerImpalaPlugin#evalDataMaskPolicies(). Refer to
      // RangerAuthorizationChecker#evalColumnMask() for further details.
      deduplicatedAuditEvents_.put(event.getEventKey(), event);
    }
    auditHandler_.getAuthzEvents().clear();
  }
{code}
However, it's possible that some SELECT events are generated during the 
analyzing phase at here:
 
[https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L308]
 Looks like if there is a column masking policy on a column and the policy 
doesn't target to the current user, Ranger plugin will generate a SELECT audit 
event. In this case, the first masking policy is on "id" column for user 
"non_owner". Then we get a SELECT event on this column. The second masking 
policy is on "name" column for the current user. We get a mask event as we 
expected.

We should deal with these non mask events correctly. On the other hand, we 
should replace all Precondition checks on the audit code paths with error 
loggings, since these should not fail a query.

cc [~fangyurao]

  was:
Users reported an IllegalStateException about column masking. I can reproduce 
it in the master branch:
{code:java}
I0925 21:42:09.684499 20809 jni-util.cc:288] ed44b3c5ca4a0e7d:8c4e884400000000] 
java.lang.IllegalStateException
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:492)
        at 
org.apache.impala.authorization.ranger.RangerAuthorizationContext.stashAuditEvents(RangerAuthorizationContext.java:71)
        at 
org.apache.impala.authorization.ranger.RangerAuthorizationChecker.postAnalyze(RangerAuthorizationChecker.java:373)
        at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:440)
        at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1562)
        at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1529)
        at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1499)
        at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:162)
{code}
*Reproducing*
 Start Impala cluster with Ranger authz enabled
{code:java}
bin/start-impala-cluster.py --impalad_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger" --catalogd_args="--server-name=server1 
--ranger_service_type=hive --ranger_app_id=impala 
--authorization_provider=ranger"
{code}
Create a tmp table using your username.
{code:java}
$ bin/impala-shell.sh
[localhost:21050] default> create table tmp_tbl (id int, name string) stored as 
parquet;
{code}
Open the Ranger WebUI at [http://localhost:6080/]. Add two column masking 
policies
 * Masking default.tmp_tbl.id using HASH for user "non_owner"
 * Masking default.tmp_tbl.name using REDACT for your username (quanlong in my 
case)

Refresh the policies in impala and query the table using your username.
{code:java}
bin/impala-shell.sh -u admin -q "refresh authorization"
bin/impala-shell.sh -q "select * from tmp_tbl"
{code}
The last query will fail with "ERROR: IllegalStateException: null".

*Clues*

In RangerAuthorizationContext.stashAuditEvents(), we deduplicate the column 
masking audit events. There is a Precondition check that all events generated 
are column masking events:
 
[https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java#L71]
 Codes:
{code:java}
  public void stashAuditEvents(RangerImpalaPlugin plugin) {
    Set<String> unfilteredMaskNames = plugin.getUnfilteredMaskNames(
        Arrays.asList("MASK_NONE"));
    for (AuthzAuditEvent event : auditHandler_.getAuthzEvents()) {
      // We assume that all the logged events until now are column 
masking-related. Since
      // we remove those AuthzAuditEvent's corresponding to the "Unmasked" 
policy of type
      // "MASK_NONE", we exclude this type of mask.
      Preconditions.checkState(unfilteredMaskNames
          .contains(event.getAccessType().toUpperCase()));

      // event.getEventKey() is the concatenation of the following fields in an
      // AuthzAuditEvent: 'user', 'accessType', 'resourcePath', 'resourceType', 
'action',
      // 'accessResult', 'sessionId', and 'clientIP'. Recall that 
'resourcePath' is the
      // concatenation of 'dbName', 'tableName', and 'columnName' that were 
used to
      // instantiate a RangerAccessResourceImpl in order to create a 
RangerAccessRequest
      // to call RangerImpalaPlugin#evalDataMaskPolicies(). Refer to
      // RangerAuthorizationChecker#evalColumnMask() for further details.
      deduplicatedAuditEvents_.put(event.getEventKey(), event);
    }
    auditHandler_.getAuthzEvents().clear();
  }
{code}
However, it's possible that some SELECT events are generated during the 
analyzing phase at here:
 
[https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L308]
 Looks like if there is a column masking policy on a column and the policy 
doesn't target to the current user, Ranger plugin will generate a SELECT audit 
event. In this case, the first masking policy is on "id" column for user 
"non_owner". Then we get a SELECT event on this column. The second masking 
policy is on "name" column for the current user. We get a mask event as we 
expected.

We should deal with these non mask events correctly. On the other hand, we 
should replace all Precondition checks on the audit code paths with error 
loggings, since these should not fail a query.

cc [~fangyurao]


> IllegalStateException in processing column masking audit events
> ---------------------------------------------------------------
>
>                 Key: IMPALA-10192
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10192
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Priority: Blocker
>         Attachments: Ranger_Policies_IMPALA-10192.json
>
>
> Users reported an IllegalStateException about column masking. I can reproduce 
> it in the master branch:
> {code:java}
> I0925 21:42:09.684499 20809 jni-util.cc:288] 
> ed44b3c5ca4a0e7d:8c4e884400000000] java.lang.IllegalStateException
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:492)
>         at 
> org.apache.impala.authorization.ranger.RangerAuthorizationContext.stashAuditEvents(RangerAuthorizationContext.java:71)
>         at 
> org.apache.impala.authorization.ranger.RangerAuthorizationChecker.postAnalyze(RangerAuthorizationChecker.java:373)
>         at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:440)
>         at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1562)
>         at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1529)
>         at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1499)
>         at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:162)
> {code}
> *Reproducing*
>  Start Impala cluster with Ranger authz enabled
> {code:java}
> bin/start-impala-cluster.py --impalad_args="--server-name=server1 
> --ranger_service_type=hive --ranger_app_id=impala 
> --authorization_provider=ranger" --catalogd_args="--server-name=server1 
> --ranger_service_type=hive --ranger_app_id=impala 
> --authorization_provider=ranger"
> {code}
> Create a tmp table using your username.
> {code:java}
> $ bin/impala-shell.sh
> [localhost:21050] default> create table tmp_tbl (id int, name string) stored 
> as parquet;
> {code}
> Open the Ranger WebUI at [http://localhost:6080/]. Add two column masking 
> policies:
>  * Masking default.tmp_tbl.id using HASH for user "non_owner"
>  * Masking default.tmp_tbl.name using REDACT for your username (quanlong in 
> my case)
> Refresh the policies in impala and query the table using your username.
> {code:java}
> bin/impala-shell.sh -u admin -q "refresh authorization"
> bin/impala-shell.sh -q "select * from tmp_tbl"
> {code}
> The last query will fail with "ERROR: IllegalStateException: null".
> The policy file is attached.
> *Clues*
> In RangerAuthorizationContext.stashAuditEvents(), we deduplicate the column 
> masking audit events. There is a Precondition check that all events generated 
> are column masking events:
>  
> [https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java#L71]
>  Codes:
> {code:java}
>   public void stashAuditEvents(RangerImpalaPlugin plugin) {
>     Set<String> unfilteredMaskNames = plugin.getUnfilteredMaskNames(
>         Arrays.asList("MASK_NONE"));
>     for (AuthzAuditEvent event : auditHandler_.getAuthzEvents()) {
>       // We assume that all the logged events until now are column 
> masking-related. Since
>       // we remove those AuthzAuditEvent's corresponding to the "Unmasked" 
> policy of type
>       // "MASK_NONE", we exclude this type of mask.
>       Preconditions.checkState(unfilteredMaskNames
>           .contains(event.getAccessType().toUpperCase()));
>       // event.getEventKey() is the concatenation of the following fields in 
> an
>       // AuthzAuditEvent: 'user', 'accessType', 'resourcePath', 
> 'resourceType', 'action',
>       // 'accessResult', 'sessionId', and 'clientIP'. Recall that 
> 'resourcePath' is the
>       // concatenation of 'dbName', 'tableName', and 'columnName' that were 
> used to
>       // instantiate a RangerAccessResourceImpl in order to create a 
> RangerAccessRequest
>       // to call RangerImpalaPlugin#evalDataMaskPolicies(). Refer to
>       // RangerAuthorizationChecker#evalColumnMask() for further details.
>       deduplicatedAuditEvents_.put(event.getEventKey(), event);
>     }
>     auditHandler_.getAuthzEvents().clear();
>   }
> {code}
> However, it's possible that some SELECT events are generated during the 
> analyzing phase at here:
>  
> [https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L308]
>  Looks like if there is a column masking policy on a column and the policy 
> doesn't target to the current user, Ranger plugin will generate a SELECT 
> audit event. In this case, the first masking policy is on "id" column for 
> user "non_owner". Then we get a SELECT event on this column. The second 
> masking policy is on "name" column for the current user. We get a mask event 
> as we expected.
> We should deal with these non mask events correctly. On the other hand, we 
> should replace all Precondition checks on the audit code paths with error 
> loggings, since these should not fail a query.
> cc [~fangyurao]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to