jkovacs-hwx commented on code in PR #4910:
URL: https://github.com/apache/hive/pull/4910#discussion_r1411081302
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##########
@@ -176,6 +179,20 @@ public RecordReader<Void, Container<Record>>
getRecordReader(InputSplit split, J
}
}
+ private static void validateFilesWithinTableDirectory(InputSplit split,
JobConf job) throws IOException {
+ boolean dataFilesWithingTableLocationOnly =
+
job.getBoolean(HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.varname,
+
HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.defaultBoolVal);
+ if (dataFilesWithingTableLocationOnly) {
+ Path tableLocation = new Path(job.get(InputFormatConfig.TABLE_LOCATION));
Review Comment:
the read path constraint is to avoid to read other location's - aka other
table's - data from such a malicious table. Such table can be constructed
manually, not necessarily written by spark (actually most probably constructed
with other methods).
The problem here is that without this read limitation, the user can use
hive's elevated privileges (doAs=false) to access secured data even if data
doesn't belong to the user's own table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]