rtrivedi12 commented on code in PR #6508:
URL: https://github.com/apache/hive/pull/6508#discussion_r3321352863


##########
ql/src/java/org/apache/hadoop/hive/ql/security/authorization/command/CommandAuthorizerV2.java:
##########
@@ -180,6 +187,79 @@ private static boolean isDeferredAuthView(Table t){
     return false;
   }
 
+  /**
+   * Returns true when a PARTITION entity should not produce its own privilege 
object
+   * because access is already covered by a view's TABLE_OR_VIEW object.
+   */
+  private static boolean isPartitionAccessedViaRegularView(ReadEntity 
partitionEntity,
+      List<? extends Entity> allEntities) {
+    if (hasDeferredViewParent(partitionEntity)) {
+      return false;
+    }
+    if (hasRegularViewParent(partitionEntity)) {
+      return true;
+    }
+    Table partTable = partitionEntity.getTable();
+    if (partTable == null) {
+      return false;
+    }
+    for (Entity entity : allEntities) {
+      if (!(entity instanceof ReadEntity) || entity.getTyp() != Type.TABLE) {
+        continue;
+      }
+      ReadEntity tableEntity = (ReadEntity) entity;
+      if (tableEntity.isDirect() || tableEntity.getTable() == null) {
+        continue;
+      }
+      Table table = tableEntity.getTable();
+      if (!partTable.getDbName().equals(table.getDbName())
+          || !partTable.getTableName().equals(table.getTableName())) {
+        continue;
+      }
+      if (hasDeferredViewParent(tableEntity)) {
+        return false;
+      }
+      if (hasRegularViewParent(tableEntity)) {
+        return true;
+      }
+    }

Review Comment:
   Thanks @saihemanth-cloudera for the review ! I thought of using view name 
alias instead of table name for Partition ReadEntity to fix this issue. But 
this does not seem semantically correct as A PARTITION HivePrivilegeObject is 
built from a physical Partition on the base table. So, I think the proper fix 
should be skipping the partition entity sent for authorization for a regular 
view.
   
   `'type':PARTITION, 'dbName':datadb, 'objectType':PARTITION, 'objectName':t1, 
'columns':[], 'partKeys':[], 'commandParams':[], 'actionType':OTHER, 
'owner':hive}]`
   
   In this case, Partition Entity often has isDirect=true  and empty parents, 
while the sibling indirect TABLE entity correctly has {viewdb.v1} hence sibling 
scan was needed. But I agree on the O(N^2), I have updated to a one-pass 
pre-scan: build a Set of base tables accessed via a regular view that removes 
the per-partition scan over allEntities
   ```
   
   PARTITION  datadb.t1@dept=a   isDirect=true   parents=[]          ← empty, 
not t1
   TABLE      datadb.t1          isDirect=false  parents=[viewdb.v1]
   TABLE      viewdb.v1          isDirect=true   parents=[]
   
   ```
   
   
   > I didn't quite understand this logic. Why do we need to check all the 
entites for a given partition object. This potentially lead to O(N^2) for huge 
partitioned table creating a bottleneck during compile phase (because 
authorization happens here)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to