[I] Spark direct path-based table access bypasses catalog mediation [iceberg]

via GitHub Wed, 20 May 2026 14:34:45 -0700


rdblue opened a new issue, #16494:
URL: https://github.com/apache/iceberg/issues/16494


   > This issue was reported to the private Apache Iceberg security mailing 
list. The submitter is being kept anonymous because the report was sent to a 
private list. After review, the issue is not considered a serious vulnerability 
that needs to be kept private, so it is being filed publicly here for tracking 
and resolution.
   >
   > Note: this submission was generated by AI. Please review its claims and 
source references carefully before acting on them.
   
   # Summary
   
   When Spark receives an Iceberg `path` value that contains `/`, the
   Iceberg source treats it as a direct storage path and creates a
   `PathIdentifier` instead of resolving a normal catalog identifier.
   
   In practice this means a user can call the Iceberg source with a
   storage URI such as `s3://bucket/warehouse/ns/table` and have Iceberg
   load the table directly from that path. If the deployment relies on
   catalog ACLs while Spark itself has broader storage credentials, the
   user can bypass the intended catalog mediation layer simply by naming
   the table by path.
   
   # Affected Maven coordinates
   
   * versioned integration artifacts:
   `org.apache.iceberg:iceberg-spark-3.4_*`,
   `org.apache.iceberg:iceberg-spark-3.5_*`,
   `org.apache.iceberg:iceberg-spark-4.0_2.13`,
   `org.apache.iceberg:iceberg-spark-4.1_2.13`
   
   # Attacker prerequisites
   
   * ability to submit Spark reads, writes, or SQL statements that
   reference an Iceberg table by path
   * a deployment where Spark has broader storage credentials than the
   user is meant to have through catalog authorization
   
   # Impact
   
   * A user who cannot access a table through the catalog can still
   access the same table if they know or can guess its storage location
   and Spark can reach it.
   * Depending on the entry point, this affects not just reads but any
   SparkCatalog operation that is serviced from a `PathIdentifier`.
   * The effective security boundary becomes storage IAM attached to the
   Spark runtime, not the catalog authorization policy operators may
   think they are enforcing.
   
   # Proof status
   
   I reproduced this locally with a targeted reproducer or exploit.
   The observed result matches the trigger and impact described above.
   
   # Key source references
   
   * org.apache.iceberg.spark.source.IcebergSource
   * org.apache.iceberg.spark.SparkCatalog
   
   Current severity assessment [2]: Important
   
   [1] https://iceberg.apache.org/security/
   [2] https://security.apache.org/blog/severityrating/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Spark direct path-based table access bypasses catalog mediation [iceberg]

Reply via email to