Hi Lina,

> Glad I can help. Do you know what configuration caused the columns not
> parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2()
> == false?
>

Yes exactly - I'm using the V1 binding.

Colm.


>
> Thanks,
>
> Lina
>
> On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <[email protected]>
> wrote:
>
>> Hi Lina,
>>
>> Thanks a lot for your help on this! I was able to get the test to work by
>> adding the following config option:
>>
>> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true");
>>
>> Colm.
>>
>> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <[email protected]> wrote:
>>
>> > Colm,
>> >
>> > The following code shows where Hive sets the column info. You can debug
>> > into hive code and see why AccessedColumns is not set.
>> >
>> > The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer
>> >
>> >               boolean isColumnInfoNeedForAuth =
>> SessionState.get().isAuthorizationModeV2() &&
>> HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
>> >         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>> >           ColumnAccessAnalyzer columnAccessAnalyzer = new
>> ColumnAccessAnalyzer(pCtx);
>> >           this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumn
>> Access(this.getColumnAccessInfo()));
>> >         }
>> >
>> >           this.LOG.info("Completed plan generation");
>> >         if (HiveConf.getBoolVar(this.conf,
>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>> >           this.putAccessedColumnsToReadEntity(this.inputs,
>> this.columnAccessInfo);
>> >         }
>> >
>> >
>> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <[email protected]> wrote:
>> >
>> >> Colm,
>> >>
>> >> I tried to reproduce your issue using sentry 2.0 (master branch) with
>> >> Hive 2.3.2.
>> >>
>> >> The test code is
>> >>
>> >>   @Test
>> >>   public void testPositiveOnAll() throws Exception {
>> >>     Connection connection = context.createConnection(ADMIN1);
>> >>     Statement statement = context.createStatement(connection);
>> >>     statement.execute("CREATE database " + DB1);
>> >>     statement.execute("use " + DB1);
>> >>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>> >>     statement.execute("CREATE ROLE user_role1");
>> >>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*");
>> >>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
>> >>     statement.close();
>> >>     connection.close();
>> >>
>> >>     connection = context.createConnection(USER1_1);
>> >>     statement = context.createStatement(connection);
>> >>     statement.execute("use " + DB1);
>> >>     statement.execute("*SELECT * FROM t1*");
>> >>
>> >>     statement.close();
>> >>     connection.close();
>> >>   }
>> >>
>> >>
>> >> required privileges:
>> >>
>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>> >>
>> >>
>> >> cached privilege:
>> >>
>> >>    - server=server1->db=db_1->table=t1->action=select
>> >>
>> >> So the authorization works.
>> >>
>> >> Note
>> >>
>> >>    - For me, the "*SELECT * FROM t1*" causes the required privileges to
>> >>    contain each column explicitly. However, for you, The "privilege"
>> to check
>> >>    looks like:
>> >>    Server=server1->Db=authz->Table=words->action=select; The columns
>> are
>> >>    not explicitly listed. Hive controls if the column is included in
>> >>    required privilege. At org.apache.sentry.binding.h
>> >>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>> >>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>> >>    accessedColumns from Hive input to add colHierarchy for each column.
>> >>    You can check if accessedColumns is empty or null for the hive
>> >>    version you are using.
>> >>    - For me, the cached privilege does not include column part. For
>> you,
>> >>    the cached privilege is "Server=server1->Db=authz->Table=words->
>> >>    *Column=**->action=select". *Can you share your test code*, so I can
>> >>    see how you grant the privilege and therefore the cached privilege
>> contains
>> >>    column?
>> >>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>> >>       user_role1", and got following error
>> >>       -
>> >>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
>> >>       [WARN - org.apache.hive.service.cli.th
>> >>       rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.
>> java:539)]
>> >>       Error executing statement:
>> >>       - org.apache.hive.service.cli.HiveSQLException: Error while
>> >>       compiling statement: FAILED: ParseException line 1:6 cannot
>> recognize input
>> >>       near 'GRANT' 'SELECT' '(' in ddl statement
>> >>       - at org.apache.hive.service.cli.op
>> eration.Operation.toSQLExcepti
>> >>       on(Operation.java:380)
>> >>       - at org.apache.hive.service.cli.operation.SQLOperation.prepare(
>> >>       SQLOperation.java:206)
>> >>       - at org.apache.hive.service.cli.op
>> eration.SQLOperation.runIntern
>> >>       al(SQLOperation.java:290)
>> >>       - at org.apache.hive.service.cli.op
>> eration.Operation.run(Operatio
>> >>       n.java:320)
>> >>       - at org.apache.hive.service.cli.se
>> ssion.HiveSessionImpl.executeS
>> >>       tatementInternal(HiveSessionImpl.java:530)
>> >>
>> >> Thanks,
>> >>
>> >> Lina
>> >>
>> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
>> >> [email protected]> wrote:
>> >>
>> >>> Thanks Kalyan! I was thinking that if the cached privilege part does
>> not
>> >>> appear in the requested "part", and if is "all", then we should skip
>> that
>> >>> part and continue on to the next one. But maybe there is a better
>> >>> solution.
>> >>>
>> >>> Colm.
>> >>>
>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>> >>> [email protected]> wrote:
>> >>>
>> >>> > Colm,
>> >>> >
>> >>> > I will look closer into this today and see If i can help you out.
>> >>> >
>> >>> > -Kalyan
>> >>> >
>> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>> >>> [email protected]>
>> >>> > wrote:
>> >>> >
>> >>> >> Hi,
>> >>> >>
>> >>> >> I've done some further analysis of the problem, and I think it is
>> not
>> >>> >> directly related to SENTRY-1291. The problem manifests in
>> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
>> >>> looks
>> >>> >> like:
>> >>> >>
>> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>> >>> >>
>> >>> >> The "privilege" I want to check looks like:
>> >>> >>
>> >>> >> Server=server1->Db=authz->Table=words->action=select;
>> >>> >>
>> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It
>> loops
>> >>> on
>> >>> >> the parts of the second privilege, and matches up to
>> "action=select".
>> >>> Here
>> >>> >> it tries to compare to "Column=*" of the cached privilege and
>> fails on
>> >>> >> this
>> >>> >> line:
>> >>> >>
>> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
>> >>> e.java#L86
>> >>> >>
>> >>> >> It's clear there's a bug here somewhere, but I'm not sure where -
>> can
>> >>> >> someone please advise?
>> >>> >>
>> >>> >> Thanks,
>> >>> >>
>> >>> >> Colm.
>> >>> >>
>> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <[email protected]>
>> wrote:
>> >>> >>
>> >>> >> > Sasha,
>> >>> >> >
>> >>> >> > sentry-1291 is helpful for the problem that sentry privilege
>> checks
>> >>> >> takes
>> >>> >> > too long with many explicit grants, which is useful for big
>> >>> customers.
>> >>> >> > Another approach that can improve the performance is to organize
>> the
>> >>> >> > privileges according to the authorization hierarchy in a tree
>> >>> >> structure, so
>> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is
>> in
>> >>> the
>> >>> >> > order of log(N), not linear of N, where N is the number of
>> >>> privileges.
>> >>> >> >
>> >>> >> > We can wait for Colm to confirm his issue is caused by
>> sentry-1291.
>> >>> If
>> >>> >> so,
>> >>> >> > it may be fixed by selecting privileges by finding if the
>> requesting
>> >>> >> > authorization object is prefix of cached privileges instead of
>> exact
>> >>> >> match.
>> >>> >> >
>> >>> >> > in SimplePrivilegeCache
>> >>> >> >
>> >>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String>
>> >>> users,
>> >>> >> > ActiveRoleSet roleSet,
>> >>> >> >       Authorizable... authorizationHierarchy) {
>> >>> >> >     Set<String> privileges = new HashSet<>();
>> >>> >> >     Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier
>> >>> >> archy);
>> >>> >> >     for (StringBuilder authzKey : authzKeys) {
>> >>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) !=
>> null) {
>> >>> >> >   <-
>> >>> >> > instead of exact matching, add extension function to check if
>> >>> >> > authzKey.toString is the prefix of the key of the entries
>> >>> >> > in cachedAuthzPrivileges.
>> >>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>> >>> >> toString()));
>> >>> >> >       }
>> >>> >> >     }
>> >>> >> >
>> >>> >> >     return privileges;
>> >>> >> >   }
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> >
>> >>> >> > Lina
>> >>> >> >
>> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>> >>> [email protected]
>> >>> >> >
>> >>> >> > wrote:
>> >>> >> >
>> >>> >> > > I think that SENTRY-1291 should be just reverted - there are
>> >>> multiple
>> >>> >> > > issues with it and no one is actually using the fix. Anyone
>> wants
>> >>> to
>> >>> >> do
>> >>> >> > it?
>> >>> >> > >
>> >>> >> > > - Alex
>> >>> >> > >
>> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <[email protected]>
>> >>> wrote:
>> >>> >> > >
>> >>> >> > > > Colm,
>> >>> >> > > >
>> >>> >> > > > Glad you find the cause!
>> >>> >> > > >
>> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is
>> >>> issue
>> >>> >> at
>> >>> >> > > > finding cached privileges.
>> >>> >> > > >
>> >>> >> > > > Cheers,
>> >>> >> > > >
>> >>> >> > > > Lina
>> >>> >> > > >
>> >>> >> > > > Sent from my iPhone
>> >>> >> > > >
>> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>> >>> >> > [email protected]>
>> >>> >> > > > wrote:
>> >>> >> > > > >
>> >>> >> > > > > Hi,
>> >>> >> > > > >
>> >>> >> > > > > I can see what the problem is (that the authorization
>> >>> hierarchy
>> >>> >> does
>> >>> >> > > not
>> >>> >> > > > > contain the column, and hence doesn't match against the
>> cached
>> >>> >> > > > privilege),
>> >>> >> > > > > but I'm not sure about the best way to solve it. Either the
>> >>> way we
>> >>> >> > are
>> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
>> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing
>> the
>> >>> >> cached
>> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>> >>> >> > CommonPrivilege).
>> >>> >> > > > >
>> >>> >> > > > > Colm.
>> >>> >> > > > >
>> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <
>> [email protected]
>> >>> >
>> >>> >> > wrote:
>> >>> >> > > > >>
>> >>> >> > > > >> Colm,
>> >>> >> > > > >>
>> >>> >> > > > >> I did not get chance to look into this issue today. Sorry
>> >>> about
>> >>> >> > that.
>> >>> >> > > > >>
>> >>> >> > > > >> You can add a e2e test case and set break point at where
>> the
>> >>> >> > > > authorization
>> >>> >> > > > >> object hierarchy to a list of authorization objects,
>> which is
>> >>> >> used
>> >>> >> > to
>> >>> >> > > do
>> >>> >> > > > >> exact match with cache
>> >>> >> > > > >>
>> >>> >> > > > >> Sent from my iPhone
>> >>> >> > > > >>
>> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>> >>> >> > > [email protected]
>> >>> >> > > > >
>> >>> >> > > > >> wrote:
>> >>> >> > > > >>>
>> >>> >> > > > >>> That would be great, thanks!
>> >>> >> > > > >>>
>> >>> >> > > > >>> Colm.
>> >>> >> > > > >>>
>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>> >>> [email protected]>
>> >>> >> > > wrote:
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> Colm,
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
>> >>> later
>> >>> >> > today.
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> Thanks,
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> Lina
>> >>> >> > > > >>>>
>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>> >>> >> > > > >> [email protected]>
>> >>> >> > > > >>>> wrote:
>> >>> >> > > > >>>>
>> >>> >> > > > >>>>> Hi all,
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry
>> >>> 2.0.0
>> >>> >> and
>> >>> >> > the
>> >>> >> > > > >> "v1"
>> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and
>> the
>> >>> "v2"
>> >>> >> > > > >> binding).
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING,
>> count
>> >>> >> INT). I
>> >>> >> > am
>> >>> >> > > > >>>> making
>> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM
>> words
>> >>> where
>> >>> >> > > count
>> >>> >> > > > ==
>> >>> >> > > > >>>>> '100'".
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the
>> following
>> >>> >> role:
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> select_all_role =
>> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>> >>> e=words->Column=*->action=sele
>> >>> >> ct
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Essentially, authorization is denied even though the
>> >>> policy is
>> >>> >> > > > correct.
>> >>> >> > > > >>>> If
>> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached
>> privilege
>> >>> is:
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> server=server1->db=authz->tabl
>> e=words->column=*=[Server=
>> >>> >> > > > >>>>> server1->Db=authz->Table=words
>> ->Column=*->action=select]
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the
>> authorizable
>> >>> >> > > hierarchy
>> >>> >> > > > >>>> looks
>> >>> >> > > > >>>>> like:
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Server [name=server1]
>> >>> >> > > > >>>>> Database [name=authz]
>> >>> >> > > > >>>>> Table [name=words]
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> There is no "column" here, and a match is not made
>> >>> against the
>> >>> >> > > cached
>> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing
>> some
>> >>> >> > > > configuration
>> >>> >> > > > >>>>> switch?
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Colm.
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> --
>> >>> >> > > > >>>>> Colm O hEigeartaigh
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>> Talend Community Coder
>> >>> >> > > > >>>>> http://coders.talend.com
>> >>> >> > > > >>>>>
>> >>> >> > > > >>>>
>> >>> >> > > > >>>
>> >>> >> > > > >>>
>> >>> >> > > > >>>
>> >>> >> > > > >>> --
>> >>> >> > > > >>> Colm O hEigeartaigh
>> >>> >> > > > >>>
>> >>> >> > > > >>> Talend Community Coder
>> >>> >> > > > >>> http://coders.talend.com
>> >>> >> > > > >>
>> >>> >> > > > >
>> >>> >> > > > >
>> >>> >> > > > >
>> >>> >> > > > > --
>> >>> >> > > > > Colm O hEigeartaigh
>> >>> >> > > > >
>> >>> >> > > > > Talend Community Coder
>> >>> >> > > > > http://coders.talend.com
>> >>> >> > > >
>> >>> >> > >
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Colm O hEigeartaigh
>> >>> >>
>> >>> >> Talend Community Coder
>> >>> >> http://coders.talend.com
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Colm O hEigeartaigh
>> >>>
>> >>> Talend Community Coder
>> >>> http://coders.talend.com
>> >>>
>> >>
>> >>
>> >
>>
>>
>> --
>> Colm O hEigeartaigh
>>
>> Talend Community Coder
>> http://coders.talend.com
>>
>
>


-- 
Colm O hEigeartaigh

Talend Community Coder
http://coders.talend.com

Reply via email to