rtrivedi12 commented on code in PR #6508:
URL: https://github.com/apache/hive/pull/6508#discussion_r3321352863
##########
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java:
##########
@@ -180,6 +187,79 @@ private static boolean isDeferredAuthView(Table t){
return false;
}
+ /**
+ * Returns true when a PARTITION entity should not produce its own privilege
object
+ * because access is already covered by a view's TABLE_OR_VIEW object.
+ */
+ private static boolean isPartitionAccessedViaRegularView(ReadEntity
partitionEntity,
+ List<? extends Entity> allEntities) {
+ if (hasDeferredViewParent(partitionEntity)) {
+ return false;
+ }
+ if (hasRegularViewParent(partitionEntity)) {
+ return true;
+ }
+ Table partTable = partitionEntity.getTable();
+ if (partTable == null) {
+ return false;
+ }
+ for (Entity entity : allEntities) {
+ if (!(entity instanceof ReadEntity) || entity.getTyp() != Type.TABLE) {
+ continue;
+ }
+ ReadEntity tableEntity = (ReadEntity) entity;
+ if (tableEntity.isDirect() || tableEntity.getTable() == null) {
+ continue;
+ }
+ Table table = tableEntity.getTable();
+ if (!partTable.getDbName().equals(table.getDbName())
+ || !partTable.getTableName().equals(table.getTableName())) {
+ continue;
+ }
+ if (hasDeferredViewParent(tableEntity)) {
+ return false;
+ }
+ if (hasRegularViewParent(tableEntity)) {
+ return true;
+ }
+ }
Review Comment:
Thanks @saihemanth-cloudera for the review ! I thought of using view name
alias instead of table name for Partition ReadEntity to fix this issue. But
this does not seem semantically correct as A PARTITION HivePrivilegeObject is
built from a physical Partition on the base table. So, I think the proper fix
should be skipping the partition entity sent for authorization for a regular
view.
`'type':PARTITION, 'dbName':datadb, 'objectType':PARTITION, 'objectName':t1,
'columns':[], 'partKeys':[], 'commandParams':[], 'actionType':OTHER,
'owner':hive}]`
In this case, Partition Entity often has isDirect=true and empty parents,
while the sibling indirect TABLE entity correctly has {viewdb.v1} hence sibling
scan was needed. But I agree on the O(N^2), I have updated to a one-pass
pre-scan: build a Set of base tables accessed via a regular view that removes
the per-partition scan over allEntities
```
PARTITION datadb.t1@dept=a isDirect=true parents=[] ← empty,
not t1
TABLE datadb.t1 isDirect=false parents=[viewdb.v1]
TABLE viewdb.v1 isDirect=true parents=[]
```
> I didn't quite understand this logic. Why do we need to check all the
entites for a given partition object. This potentially lead to O(N^2) for huge
partitioned table creating a bottleneck during compile phase (because
authorization happens here)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]