tonybadguy commented on issue #137:
URL: https://github.com/apache/polaris/issues/137#issuecomment-3618327821
I’d like to propose a lightweight approach to implementing row-level access
control (RLAC) in Polaris, building on Iceberg’s metadata model. This could
serve as a practical starting point for the fine-grained access control work.
### The Challenge
Iceberg catalogs like Polaris operate primarily at the metadata level, which
makes true RLAC tricky: Parquet files often contain a mix of rows that should
be visible and hidden, and you don’t want the catalog to scan data files just
to enforce policies. Offloading RLAC to the query engine introduces a couple of
issues:
- **Security risk:** If the engine is user-controlled, you can’t safely
assume it will enforce policies correctly, and it can potentially infer
information about restricted data.
- **Enforcement gap:** It’s odd to define permissions in the catalog but
rely on engines to enforce them, with no guarantee of consistent behavior
across engines.
### Proposed Solution: Partition-Level RLAC
The idea is to shift RLAC to the **partition level**, where the catalog
filters metadata (manifests / manifest entries) based on user/role permissions
before returning it to the query engine.
Concretely:
- When serving table metadata that includes manifest lists / manifests for a
given principal, Polaris evaluates the partition values against RBAC / policy
rules and drops any manifest entries whose partition values the principal is
not allowed to see.
- This is a metadata-only operation: partition information is already
present in the manifest files, so no data-file scanning is required.
- For row-based policies such as “only show rows where `region = 'EU'`” or
“`tenant_id` in {A, B}”, admins partition the table on those policy columns
(typically with identity transforms, e.g. `PARTITIONED BY (region)` or
`PARTITIONED BY (tenant_id)`) and define the policy in Polaris. The guarantee
is that a partition doesn’t mix allowed and disallowed values for that column.
In other words, “row-level” policies are implemented as **partition-level**
policies, with the partitioning scheme chosen to align with the security model.
### Why This Fits
- **Efficient and secure:** Enforcement happens entirely in the catalog.
Unauthorized partitions simply never appear in the metadata the engine
receives, avoiding data leakage and engine-specific behavior.
- **Simple to implement:** It reuses existing manifest/metadata processing.
The main change is to apply a per-principal partition predicate whenever
Polaris returns manifests or data-file references for planning scans.
- **Covers many real-world cases:** While it doesn’t support arbitrary
per-row predicates, it handles a large class of practical scenarios
(multi-tenant isolation, region-based access, BU-level slicing, time-window
access when time is part of the partition spec, etc.) without introducing
performance overhead.
This isn’t intended to be the final word on RLAC—complex predicates and
non-partition-aligned policies would still need a richer mechanism—but it seems
like a solid, low-complexity building block that aligns well with iceberg
catalog's metadata focus and could stand on its own as a “partition-based row
security” mode.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]