Re: [PR] HIVE-27323: Iceberg: malformed manifest file or list can cause data breach. [hive]

via GitHub Thu, 30 Nov 2023 10:04:25 -0800


jkovacs-hwx commented on code in PR #4910:
URL: https://github.com/apache/hive/pull/4910#discussion_r1411081302



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##########
@@ -176,6 +179,20 @@ public RecordReader<Void, Container<Record>> 
getRecordReader(InputSplit split, J
     }
   }
 
+  private static void validateFilesWithinTableDirectory(InputSplit split, 
JobConf job) throws IOException {
+    boolean dataFilesWithingTableLocationOnly =
+        
job.getBoolean(HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.varname,
+            
HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.defaultBoolVal);
+    if (dataFilesWithingTableLocationOnly) {
+      Path tableLocation = new Path(job.get(InputFormatConfig.TABLE_LOCATION));

Review Comment:
   the read path constraint is to avoid to read other location's - aka other 
table's - data from such a malicious table. Such table can be constructed 
manually, not necessarily written by spark (actually most probably constructed 
with other methods). 
   The problem here is that without this read limitation, the user can use 
hive's elevated privileges (doAs=false) to access secured data even if data 
doesn't belong to the user's own table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-27323: Iceberg: malformed manifest file or list can cause data breach. [hive]

Reply via email to