[
https://issues.apache.org/jira/browse/SENTRY-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256443#comment-16256443
]
Na Li commented on SENTRY-1964:
-------------------------------
The problem is to find the authObj from path.
For example, with the code change to not sending partition to HDFS, initially,
the INode hierarchy of the table path "tmp/external/tables/ext2_before" is
below, and the authObj of the partition "tmp/external/tables/ext2_before/i=1"
or "tmp/external/tables/ext2_before/i=1/stuff.txt" can use its parent's
authObj, which is default.ext2. So sentry ACL can be found using the authObj.
1) path name=tmp, type=DIR, authObj=null
2) path name=external, type=PREFIX, authObj=null
3) path name=tables, type=DIR, authObj=null
4) path name=ext2_before, type=AUTHZ_OBJECT, authObj=default.ext2
after "alter table ext2 set location
\'hdfs:///tmp/external/tables/ext2_after\'", the INode hierarchy becomes
1) path name=tmp, type=DIR, authObj=null
2) path name=external, type=PREFIX, authObj=null
3) path name=tables, type=DIR, authObj=null
4) path name=ext2_after, type=AUTHZ_OBJECT, authObj=default.ext2
When finding authObj for the partition "tmp/external/tables/ext2_before/i=1" or
"tmp/external/tables/ext2_before/i=1/stuff.txt", its parent is "tables", and
its authObj is null. So the SentryINodeAttributes won't be used, and no Sentry
ACL
In public List<AclEntry> getAclEntries(String[] pathElements) at
SentryAuthorizationInfo, the authObjs are found in path first at
"{color:red}Set<String> authzObjs =
authzPaths.findAuthzObject(pathElements);{color}", then use authObjs to find
the acl. If authObjs is null, then no sentry acl.
{code}
public List<AclEntry> getAclEntries(String[] pathElements) {
lock.readLock().lock();
try {
Set<String> authzObjs = authzPaths.findAuthzObject(pathElements);
// Apparently setFAcl throws error if 'group::---' is not present
AclEntry noGroup = AclEntry.parseAclEntry("group::---", true);
Set<AclEntry> retSet = new HashSet<>();
retSet.add(noGroup);
if (authzObjs == null) {
retSet.addAll(Collections.<AclEntry>emptyList());
return new ArrayList<>(retSet);
}
// No duplicate acls should be added.
for (String authzObj: authzObjs) {
retSet.addAll(authzPermissions.getAcls(authzObj));
}
return new ArrayList<>(retSet);
} finally {
lock.readLock().unlock();
}
}
{code}
> HDFS sync does not need partition locations (usually)
> -----------------------------------------------------
>
> Key: SENTRY-1964
> URL: https://issues.apache.org/jira/browse/SENTRY-1964
> Project: Sentry
> Issue Type: Improvement
> Components: Sentry
> Affects Versions: 2.0.0
> Reporter: Na Li
> Assignee: Na Li
> Priority: Critical
> Attachments: SENTRY-1964.001.patch, SENTRY-1964.001.patch,
> SENTRY-1964.002.patch
>
>
> Right now, sentry saves partition info from HMS and send it to HDFS. HDFS
> only needs database and table info, and does not need partition info for ACL
> unless the partion location is not sharing the same prefix of its table.
> The partition data amount is huge, and causes performance issue. We can
> optimize it by not saving and not sending partition info if it shares the
> same path of its table.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)