singhpk234 commented on code in PR #13879:
URL: https://github.com/apache/iceberg/pull/13879#discussion_r2760639219
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3347,6 +3347,105 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+ properties:
+ required-column-projections:
+ description: >
+ A list of projections that MUST be applied prior to any
query-specified
+ projections.
+ If this property is absent, no mandatory projection applies,
+ and a reader MAY project any subset of columns of the table,
including all columns.
+
+ 1. A reader MUST project only columns listed in the
required-column-projections.
+ - If a listed column has a transform, the reader MUST apply it
and replace
+ all references to the underlying column with the transformed
value
+ (for example, truncate[4](cc) MUST be projected as
truncate[4](cc) AS cc,
+ and all references to cc during query evaluation post applying
required-row-filter MUST resolve to this alias).
+ - Columns not listed in the required-column-projections MUST NOT
be read.
+
+ 2. A column MUST appear at most once in the
required-column-projections.
+
+ 3. If a projected column's corresponding entry includes an action
that the reader cannot evaluate,
+ the reader MUST fail rather than ignore the transform.
+
+ 4. An identity transform is equivalent to projecting the column
directly.
+
+ 5. The data type of the projected column MUST match the data type
defined for the transform result.
+
+ type: array
+ items:
+ $ref: '#/components/schemas/Projection'
+ required-row-filter:
+ description: >
+ An expression that filters rows in the table that the
authenticated principal does not have access to.
+
+ 1. A reader MUST discard any row for which the filter evaluates to
false or null, and
+ no information derived from discarded rows MAY be included in
the query result.
+
+ 2. Row filters MUST be evaluated against the original,
untransformed column values.
+ Required projections MUST be applied only after row filters are
applied.
+
+ 3. If a client cannot interpret or evaluate a provided filter
expression, it MUST fail.
+
+ 4. If this property is absent, null, or always true then no
mandatory filtering is required.
+ $ref: '#/components/schemas/Expression'
+
+ Projection:
+ type: object
+ description: >
+ Defines a projection for a column.
+ If action is not specified, the column is projected as-is.
+ properties:
+ field-id:
+ type: integer
+ description: field id of the column being projected.
+ action:
+ $ref: '#/components/schemas/Action'
+ required:
+ - field-id
+
+ Action:
+ description: Defines the specific action to be executed for computing
the projection.
+ oneOf:
+ - $ref: '#/components/schemas/MaskHashSha256'
+ - $ref: '#/components/schemas/ReplaceWithNull'
+ - $ref: '#/components/schemas/MaskAlphanumeric'
Review Comment:
> I don't think we want to enumerate specific actions in the REST spec
This is mostly inspired from Ranger like semantics like here
https://docs.cloudera.com/runtime/7.3.1/security-ranger-authorization/topics/security-ranger-resource-based-column-masking-in-hive-with-ranger-policies.html
Do you mean IRC should not get into the business of defining such constructs
and its entirely on the catalog to define a `UDF` for each dialect (spark |
trino) to consistently enforce that ?
My understanding is it doesn't hurt if we can clearly define the action we
want the client to take and left it to the implementation to just do the action
for example case like pyiceberg this could just be entirely done in a python
program.
Catalog implementer defintely have a choice to not use them at all in a
sense that these policy are defined in the catlaog / policy store in the first
place, they just support that, but if some catalog wants a simple policy, they
can get a policy definition by these actions.
> Inconsistencies and may force engines that do not support these exact
transforms to be non-compliant
I totatlly agree with these, my understanding is we need to be very specific
/ clear on what this means and when it get applies from data types pov (which i
think we do already). And if an engine doesn't support this action it should
bail out (fail closed) rather than cause correctness, the same will be true for
our expression too lets say an engine is working on expression sdk which
doesn;t support these new expression we should fail at that point and we can
trust engine failing rather than bypassing it because some one aka admin
designated this engine as trusted post vetting.
Please let me know what do you think, considering above !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]