jkovacs-hwx commented on code in PR #4910:
URL: https://github.com/apache/hive/pull/4910#discussion_r1411083165
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##########
@@ -176,6 +179,20 @@ public RecordReader<Void, Container<Record>>
getRecordReader(InputSplit split, J
}
}
+ private static void validateFilesWithinTableDirectory(InputSplit split,
JobConf job) throws IOException {
+ boolean dataFilesWithingTableLocationOnly =
Review Comment:
Is this 'g' a typo in dataFilesWithingTableLocationOnly ?
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java:
##########
@@ -176,6 +179,20 @@ public RecordReader<Void, Container<Record>>
getRecordReader(InputSplit split, J
}
}
+ private static void validateFilesWithinTableDirectory(InputSplit split,
JobConf job) throws IOException {
+ boolean dataFilesWithingTableLocationOnly =
+
job.getBoolean(HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.varname,
+
HiveConf.ConfVars.HIVE_ICEBERG_ALLOW_DATA_IN_TABLE_LOCATION_ONLY.defaultBoolVal);
+ if (dataFilesWithingTableLocationOnly) {
+ Path tableLocation = new Path(job.get(InputFormatConfig.TABLE_LOCATION));
Review Comment:
> place a custom jar into the classpath
That would be adding a malicious jar to AUX path like faking it as a UDF,
right?
Would a custom/per-user jar could lead to the same class override?
- If only the AUX path placed jar is the problem, then I would not bring it
into this issue's scope as adding jar to AUX path is something the user can't
do, only admins.
- If a user can add a jar at runtime to override these Iceberg classes, then
this should be definitely a follow up fix on this issue, but only if this is
not a global problem, like overriding other classes, especially e.g. Ranger
authorization classes, or masking functions policies could rely on.
> Why don't we consider storage-based authorization
This is a quick fix until the storage-based authorization solution - or
other more robust one - is worked out.
> not expose the metadata
Unfortunately that might break the Iceberg functionality e.g. via limiting
the output of the metadata table queries like catalog.table.file or .snapshots,
etc. and if the fix would depend on assuming the data locations are not leaked
is also not a robust one.
Plus the issue this fix is trying to works around for now also affects
targeting non-Iceberg tables, where getting the data locations is easy via
INPUT__FILE__NAME.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]