mbutrovich commented on code in PR #3523:
URL: https://github.com/apache/datafusion-comet/pull/3523#discussion_r2813680113


##########
spark/src/main/scala/org/apache/comet/iceberg/IcebergReflection.scala:
##########
@@ -237,6 +237,38 @@ object IcebergReflection extends Logging {
     }
   }
 
+  /**
+   * Gets storage properties from an Iceberg table's FileIO.
+   *
+   * This extracts credentials from the FileIO implementation, which is 
critical for REST catalog
+   * credential vending. The REST catalog returns temporary S3 credentials 
per-table via the
+   * loadTable response, stored in the table's FileIO (typically 
ResolvingFileIO).
+   *
+   * The properties() method is not on the FileIO interface -- it exists on 
specific
+   * implementations like ResolvingFileIO and S3FileIO. Returns None 
gracefully when unavailable.
+   */
+  def getFileIOProperties(table: Any): Option[Map[String, String]] = {
+    import scala.jdk.CollectionConverters._
+    getFileIO(table).flatMap { fileIO =>
+      findMethodInHierarchy(fileIO.getClass, "properties").flatMap { 
propsMethod =>
+        try {

Review Comment:
   I think we want to remove the `try/catch` here because failure to invoke a 
method we expect to exist should cause a fallback.
   If `findMethodInHierarchy` returns `Some(method)` but invoke fails, that's 
unexpected and should propagate rather than silently fall back to Hadoop 
credentials. The `flatMap` already handles the "method doesn't exist" case by 
returning `None`.



##########
spark/src/test/scala/org/apache/comet/IcebergReadFromS3Suite.scala:
##########
@@ -227,4 +229,74 @@ class IcebergReadFromS3Suite extends CometS3TestBase {
 
     spark.sql("DROP TABLE s3_catalog.db.mor_delete_test")
   }
+
+  test("REST catalog credential vending rejects wrong credentials") {
+    assume(icebergAvailable, "Iceberg not available in classpath")
+
+    val wrongCreds = Map(
+      "s3.access-key-id" -> "WRONG_ACCESS_KEY",
+      "s3.secret-access-key" -> "WRONG_SECRET_KEY",
+      "s3.endpoint" -> minioContainer.getS3URL,
+      "s3.path-style-access" -> "true")
+    val warehouse = s"s3a://$testBucketName/warehouse-bad-creds"
+
+    withRESTCatalog(vendedCredentials = wrongCreds, warehouseLocation = 
Some(warehouse)) {
+      (restUri, _, _) =>
+        withSQLConf(

Review Comment:
   Can we strengthen the negative test by adding correct Hadoop credentials 
alongside wrong vended credentials to prove vended credentials actually take 
precedence:
   ```scala
   withSQLConf(
     // ... catalog configs ...
     // Correct Hadoop credentials that would work if vending was ignored
     "spark.hadoop.fs.s3a.access.key" -> userName,
     "spark.hadoop.fs.s3a.secret.key" -> password,
     "spark.hadoop.fs.s3a.endpoint" -> minioContainer.getS3URL,
     "spark.hadoop.fs.s3a.path.style.access" -> "true"
   ) {
     // 403 now proves vended creds (wrong) took precedence over Hadoop 
(correct)
   }
   ```
   or am I misinterpreting something with the spec?



##########
spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala:
##########
@@ -387,10 +387,20 @@ case class CometScanRule(session: SparkSession)
 
             val hadoopS3Options = 
NativeConfig.extractObjectStoreOptions(hadoopConf, effectiveUri)
 
-            val catalogProperties =
+            val hadoopDerivedProperties =
               org.apache.comet.serde.operator.CometIcebergNativeScan
                 .hadoopToIcebergS3Properties(hadoopS3Options)
 
+            // Extract vended credentials from FileIO (REST catalog credential 
vending).
+            // FileIO properties take precedence over Hadoop-derived 
properties because
+            // they contain per-table credentials vended by the REST catalog.
+            val fileIOProperties = tableOpt
+              .flatMap(IcebergReflection.getFileIOProperties)
+              
.map(org.apache.comet.serde.operator.CometIcebergNativeScan.filterStorageProperties)

Review Comment:
   Can we clean up the fully qualified 
`org.apache.comet.serde.operator.CometIcebergNativeScan.filterStorageProperties`
 with an import?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to