tonybadguy commented on issue #137:
URL: https://github.com/apache/polaris/issues/137#issuecomment-3618327821

   I’d like to propose a lightweight approach to implementing row-level access 
control (RLAC) in Polaris, building on Iceberg’s metadata model. This could 
serve as a practical starting point for the fine-grained access control work.
   
   ### The Challenge
   
   Iceberg catalogs like Polaris operate primarily at the metadata level, which 
makes true RLAC tricky: Parquet files often contain a mix of rows that should 
be visible and hidden, and you don’t want the catalog to scan data files just 
to enforce policies. Offloading RLAC to the query engine introduces a couple of 
issues:
   
   - **Security risk:** If the engine is user-controlled, you can’t safely 
assume it will enforce policies correctly, and it can potentially infer 
information about restricted data.
   - **Enforcement gap:** It’s odd to define permissions in the catalog but 
rely on engines to enforce them, with no guarantee of consistent behavior 
across engines.
   
   ### Proposed Solution: Partition-Level RLAC
   
   The idea is to shift RLAC to the **partition level**, where the catalog 
filters metadata (manifests / manifest entries) based on user/role permissions 
before returning it to the query engine.
   
   Concretely:
   
   - When serving table metadata that includes manifest lists / manifests for a 
given principal, Polaris evaluates the partition values against RBAC / policy 
rules and drops any manifest entries whose partition values the principal is 
not allowed to see.
   - This is a metadata-only operation: partition information is already 
present in the manifest files, so no data-file scanning is required.
   - For row-based policies such as “only show rows where `region = 'EU'`” or 
“`tenant_id` in {A, B}”, admins partition the table on those policy columns 
(typically with identity transforms, e.g. `PARTITIONED BY (region)` or 
`PARTITIONED BY (tenant_id)`) and define the policy in Polaris. The guarantee 
is that a partition doesn’t mix allowed and disallowed values for that column.
   
   In other words, “row-level” policies are implemented as **partition-level** 
policies, with the partitioning scheme chosen to align with the security model.
   
   ### Why This Fits
   
   - **Efficient and secure:** Enforcement happens entirely in the catalog. 
Unauthorized partitions simply never appear in the metadata the engine 
receives, avoiding data leakage and engine-specific behavior.
   - **Simple to implement:** It reuses existing manifest/metadata processing. 
The main change is to apply a per-principal partition predicate whenever 
Polaris returns manifests or data-file references for planning scans.
   - **Covers many real-world cases:** While it doesn’t support arbitrary 
per-row predicates, it handles a large class of practical scenarios 
(multi-tenant isolation, region-based access, BU-level slicing, time-window 
access when time is part of the partition spec, etc.) without introducing 
performance overhead.
   
   This isn’t intended to be the final word on RLAC—complex predicates and 
non-partition-aligned policies would still need a richer mechanism—but it seems 
like a solid, low-complexity building block that aligns well with iceberg 
catalog's metadata focus and could stand on its own as a “partition-based row 
security” mode.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to