mbutrovich commented on code in PR #3523:
URL: https://github.com/apache/datafusion-comet/pull/3523#discussion_r2813680113
##########
spark/src/main/scala/org/apache/comet/iceberg/IcebergReflection.scala:
##########
@@ -237,6 +237,38 @@ object IcebergReflection extends Logging {
}
}
+ /**
+ * Gets storage properties from an Iceberg table's FileIO.
+ *
+ * This extracts credentials from the FileIO implementation, which is
critical for REST catalog
+ * credential vending. The REST catalog returns temporary S3 credentials
per-table via the
+ * loadTable response, stored in the table's FileIO (typically
ResolvingFileIO).
+ *
+ * The properties() method is not on the FileIO interface -- it exists on
specific
+ * implementations like ResolvingFileIO and S3FileIO. Returns None
gracefully when unavailable.
+ */
+ def getFileIOProperties(table: Any): Option[Map[String, String]] = {
+ import scala.jdk.CollectionConverters._
+ getFileIO(table).flatMap { fileIO =>
+ findMethodInHierarchy(fileIO.getClass, "properties").flatMap {
propsMethod =>
+ try {
Review Comment:
I think we want to remove the `try/catch` here because failure to invoke a
method we expect to exist should cause a fallback.
If `findMethodInHierarchy` returns `Some(method)` but invoke fails, that's
unexpected and should propagate rather than silently fall back to Hadoop
credentials. The `flatMap` already handles the "method doesn't exist" case by
returning `None`.
##########
spark/src/test/scala/org/apache/comet/IcebergReadFromS3Suite.scala:
##########
@@ -227,4 +229,74 @@ class IcebergReadFromS3Suite extends CometS3TestBase {
spark.sql("DROP TABLE s3_catalog.db.mor_delete_test")
}
+
+ test("REST catalog credential vending rejects wrong credentials") {
+ assume(icebergAvailable, "Iceberg not available in classpath")
+
+ val wrongCreds = Map(
+ "s3.access-key-id" -> "WRONG_ACCESS_KEY",
+ "s3.secret-access-key" -> "WRONG_SECRET_KEY",
+ "s3.endpoint" -> minioContainer.getS3URL,
+ "s3.path-style-access" -> "true")
+ val warehouse = s"s3a://$testBucketName/warehouse-bad-creds"
+
+ withRESTCatalog(vendedCredentials = wrongCreds, warehouseLocation =
Some(warehouse)) {
+ (restUri, _, _) =>
+ withSQLConf(
Review Comment:
Can we strengthen the negative test by adding correct Hadoop credentials
alongside wrong vended credentials to prove vended credentials actually take
precedence:
```scala
withSQLConf(
// ... catalog configs ...
// Correct Hadoop credentials that would work if vending was ignored
"spark.hadoop.fs.s3a.access.key" -> userName,
"spark.hadoop.fs.s3a.secret.key" -> password,
"spark.hadoop.fs.s3a.endpoint" -> minioContainer.getS3URL,
"spark.hadoop.fs.s3a.path.style.access" -> "true"
) {
// 403 now proves vended creds (wrong) took precedence over Hadoop
(correct)
}
```
or am I misinterpreting something with the spec?
##########
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##########
@@ -387,10 +387,20 @@ case class CometScanRule(session: SparkSession)
val hadoopS3Options =
NativeConfig.extractObjectStoreOptions(hadoopConf, effectiveUri)
- val catalogProperties =
+ val hadoopDerivedProperties =
org.apache.comet.serde.operator.CometIcebergNativeScan
.hadoopToIcebergS3Properties(hadoopS3Options)
+ // Extract vended credentials from FileIO (REST catalog credential
vending).
+ // FileIO properties take precedence over Hadoop-derived
properties because
+ // they contain per-table credentials vended by the REST catalog.
+ val fileIOProperties = tableOpt
+ .flatMap(IcebergReflection.getFileIOProperties)
+
.map(org.apache.comet.serde.operator.CometIcebergNativeScan.filterStorageProperties)
Review Comment:
Can we clean up the fully qualified
`org.apache.comet.serde.operator.CometIcebergNativeScan.filterStorageProperties`
with an import?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]