jkovacs-hwx commented on code in PR #4910:
URL: https://github.com/apache/hive/pull/4910#discussion_r1411083165


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##########
@@ -176,6 +179,20 @@ public RecordReader<Void, Container<Record>> 
getRecordReader(InputSplit split, J
     }
   }
 
+  private static void validateFilesWithinTableDirectory(InputSplit split, 
JobConf job) throws IOException {
+    boolean dataFilesWithingTableLocationOnly =

Review Comment:
   Is this 'g' a typo in dataFilesWithingTableLocationOnly ?



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##########
@@ -176,6 +179,20 @@ public RecordReader<Void, Container<Record>> 
getRecordReader(InputSplit split, J
     }
   }
 
+  private static void validateFilesWithinTableDirectory(InputSplit split, 
JobConf job) throws IOException {
+    boolean dataFilesWithingTableLocationOnly =
+        
job.getBoolean(HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.varname,
+            
HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.defaultBoolVal);
+    if (dataFilesWithingTableLocationOnly) {
+      Path tableLocation = new Path(job.get(InputFormatConfig.TABLE_LOCATION));

Review Comment:
   > place a custom jar into the classpath
   
   That would be adding a malicious jar to AUX path like faking it as a UDF, 
right?  
   Would a custom/per-user jar could lead to the same class override?
   
   - If only the AUX path placed jar is the problem, then I would not bring it 
into this issue's scope as adding jar to AUX path is something the user can't 
do, only admins. 
   - If a user can add a jar at runtime to override these Iceberg classes, then 
this should be definitely a follow up fix on this issue, but only if this is 
not a global problem, like overriding other classes, especially e.g. Ranger 
authorization classes, or masking functions policies could rely on.
   
   > Why don't we consider storage-based authorization
   
   This is a quick fix until the storage-based authorization solution - or 
other more robust one - is worked out.
   
   > not expose the metadata
   
   Unfortunately that might break the Iceberg functionality e.g. via limiting 
the output of the metadata table queries like catalog.table.file or .snapshots, 
etc. and if the fix would depend on assuming the data locations are not leaked 
is also not a robust one.
   Plus the issue this fix is trying to works around for now also affects 
targeting non-Iceberg tables, where getting the data locations is easy via 
INPUT__FILE__NAME. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to