stevenzwu commented on code in PR #13879:
URL: https://github.com/apache/iceberg/pull/13879#discussion_r3211673037
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
Review Comment:
Normative actor terminology drifts across this section: "client" here on
line 3488, "reader" on lines 3520/3523/3528/3548, "engine" on line 3613
(`MaskToFixedValue`) and line 3754 (`Sha256QueryLocal`). For a normative spec,
picking one term and using it everywhere — "reader" is probably the cleanest,
since this attaches to read-side behavior — would make the MUST/SHOULD
requirements easier to track for implementers.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
Review Comment:
This global rule conflicts with two of the actions defined below:
- `mask-to-fixed-value` (line 3610): the action's purpose is to replace
values with a constant, but NULL would pass through unmasked, which both
contradicts the action's intent and can leak the existence of NULL — sometimes
itself sensitive information.
- `apply-expression` (line 3768): an arbitrary expression like
`coalesce(col, 'unknown')` will produce non-NULL output from NULL input; the
rule above forbids that.
Suggest dropping the global rule and stating NULL-input behavior per-action
instead. `replace-with-null` and the SHA-256 / truncate / mask-alphanum /
show-first-4 / show-last-4 actions can keep the "NULL in ⇒ NULL out" guarantee
in their individual descriptions; `mask-to-fixed-value` and `apply-expression`
need to define their own behavior.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
+ properties:
+ required-column-projections:
+ description: >
+ A list of columns that require specific actions to be applied when
reading.
+
+ If this property is absent, a reader MAY access all columns of the
table as-is
+ without any mandatory transformations.
+
+ If this property is present, each listed column MUST have its
specified
+ action applied. Columns not listed in required-column-projections
+ are not subject to any read restrictions.
+
+ When this list is present:
+
+ 1. For each column listed in required-column-projections, the
reader MUST apply
+ the specified action before returning values for that column.
+
+ 2. The reader MUST replace all output references to the column
with the result
+ of the action, presenting the result under the original column
name. For
+ example, if the action for column cc is mask-alphanum, the
reader MUST
+ return the masked value as cc in the query output.
+
+ 3. Columns not listed in required-column-projections MAY be
projected normally
+ by the reader without any mandatory transformations.
+
+ 4. A column MUST appear at most once in
required-column-projections.
+
+ 5. If a projected column's action cannot be evaluated by the reader
+ (including unrecognized action types), the reader MUST fail
rather than
+ ignore or skip the action.
+
+ 6. Each action defines the output type for its column. For all
predefined
+ actions except apply-expression, the output type matches the
input column
+ type. For apply-expression, the output type is determined by the
expression.
Review Comment:
A worked JSON example for `ReadRestrictions` in the spec body would help
readers a lot — this is a non-trivial structure with a discriminated union and
there is no example here. The one posted as a PR comment (`{
"required-column-projections": [ { "field-id": 4, "action": "show-last-4" },
... ], "required-row-filter": ... }`) reads well; consider lifting it into the
description.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
+ properties:
+ required-column-projections:
+ description: >
+ A list of columns that require specific actions to be applied when
reading.
+
+ If this property is absent, a reader MAY access all columns of the
table as-is
+ without any mandatory transformations.
+
+ If this property is present, each listed column MUST have its
specified
+ action applied. Columns not listed in required-column-projections
+ are not subject to any read restrictions.
+
+ When this list is present:
+
+ 1. For each column listed in required-column-projections, the
reader MUST apply
+ the specified action before returning values for that column.
+
+ 2. The reader MUST replace all output references to the column
with the result
+ of the action, presenting the result under the original column
name. For
+ example, if the action for column cc is mask-alphanum, the
reader MUST
+ return the masked value as cc in the query output.
+
+ 3. Columns not listed in required-column-projections MAY be
projected normally
+ by the reader without any mandatory transformations.
+
+ 4. A column MUST appear at most once in
required-column-projections.
+
+ 5. If a projected column's action cannot be evaluated by the reader
+ (including unrecognized action types), the reader MUST fail
rather than
+ ignore or skip the action.
+
+ 6. Each action defines the output type for its column. For all
predefined
+ actions except apply-expression, the output type matches the
input column
+ type. For apply-expression, the output type is determined by the
expression.
+
+ type: array
+ items:
+ $ref: '#/components/schemas/Action'
+ required-row-filter:
+ description: >
+ An expression that filters rows in the table that the
authenticated principal does not have access to.
+
+ 1. The expression MUST evaluate to a boolean. A reader MUST
discard any row for which
+ the filter evaluates to FALSE, and no information derived from
discarded rows
+ MAY be included in the query result.
+
+ 2. Row filters MUST be evaluated against the original,
untransformed column values.
+ Required projections MUST be applied only after row filters are
applied.
+
+ 3. If a client cannot interpret or evaluate a provided filter
expression, it MUST fail.
+
+ 4. If this property is absent, null, or always true then no
mandatory filtering is required.
+ $ref: '#/components/schemas/Expression'
+
+ Action:
+ discriminator:
+ propertyName: action
+ mapping:
+ mask-alphanum: '#/components/schemas/MaskAlphanum'
+ mask-to-fixed-value: '#/components/schemas/MaskToFixedValue'
+ replace-with-null: '#/components/schemas/ReplaceWithNull'
+ show-first-4: '#/components/schemas/ShowFirst4'
+ show-last-4: '#/components/schemas/ShowLast4'
+ truncate-to-year: '#/components/schemas/TruncateToYear'
+ truncate-to-month: '#/components/schemas/TruncateToMonth'
+ sha-256-global: '#/components/schemas/Sha256Global'
+ sha-256-query-local: '#/components/schemas/Sha256QueryLocal'
+ apply-expression: '#/components/schemas/ApplyExpression'
+ type: object
+ required:
+ - action
+ - field-id
+ properties:
+ action:
+ type: string
+ field-id:
+ type: integer
+ description: field id of the column being projected.
+
+ MaskAlphanum:
+ description: >
+ Redacts the column value Unicode code point by code point using the
following rules:
+
+ - Digits (U+0030–U+0039, 0-9) are replaced with 'n'
+ - The following punctuation characters are kept as-is:
+ U+0028 '(' LEFT PARENTHESIS
+ U+0029 ')' RIGHT PARENTHESIS
+ U+002C ',' COMMA
+ U+002E '.' FULL STOP
+ U+002D '-' HYPHEN-MINUS
+ U+0040 '@' COMMERCIAL AT
+ - All other Unicode characters (including letters, whitespace, and any
punctuation
+ not listed above) are replaced with 'x'
+
+ For example: "[email protected]" → "[email protected]"
+
+ Applicable to: string
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "mask-alphanum"
+
+ MaskToFixedValue:
+ description: >
+ Replaces the column value with a predefined type-specific fixed value.
+ Engines MUST use exactly the values listed below to ensure consistency
+ across implementations.
+
+ Fixed values by type:
+ - boolean: false
+ - int: 0
+ - long: 0
+ - float: 0.0
+ - double: 0.0
+ - decimal(p, s): 0 (zero with s digits after the decimal point, e.g.
0.00 for decimal(p,2))
+ - string: "XXXXXXXX"
+ - date: 1970-01-01
+ - time: 00:00:00
+ - timestamp: 1970-01-01T00:00:00
+ - timestamptz: 1970-01-01T00:00:00+00:00
+ - timestamp_ns: 1970-01-01T00:00:00.000000000
+ - timestamptz_ns: 1970-01-01T00:00:00.000000000+00:00
+ - uuid: 00000000-0000-0000-0000-000000000000
+ - fixed(n): n zero bytes
+ - binary: empty byte sequence
+ - variant: {}
+ - geometry: POINT EMPTY
+ - geography: POINT EMPTY
+ - list: empty list []
+ - map: empty map {}
+ - struct: struct with each field set to its type-specific default
(applied recursively)
+
+ Applicable to: all data types
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "mask-to-fixed-value"
+
+ ReplaceWithNull:
+ description: >
+ Replaces the entire column value with NULL.
+
+ Applicable to: all nullable types
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "replace-with-null"
+
+ ShowFirst4:
+ description: >
+ Preserves the first 4 Unicode code points of the column value and
redacts the remainder
+ using mask-alphanum rules (see MaskAlphanum for the exact character
rules).
+ Values with 4 or fewer Unicode code points are returned unchanged.
+
+ For example: "[email protected]" → "[email protected]"
+
+ Applicable to: string
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "show-first-4"
+
+ ShowLast4:
+ description: >
+ Redacts all Unicode code points except the last 4 using mask-alphanum
rules
+ (see MaskAlphanum for the exact character rules).
+ Values with 4 or fewer Unicode code points are returned unchanged.
+
+ For example: "4111-1111-1111-4444" → "nnnn-nnnn-nnnn-4444"
+
+ Applicable to: string
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "show-last-4"
+
+ TruncateToYear:
+ description: >
+ Truncates the column value to year precision, setting month, day, and
time components
+ to their minimum values. The output type matches the input type.
+
+ For example: 2024-07-15 → 2024-01-01
+ For timestamptz and timestamptz_ns, truncation is performed in UTC.
+
+ Applicable to: date, timestamp, timestamptz, timestamp_ns,
timestamptz_ns
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "truncate-to-year"
+
+ TruncateToMonth:
+ description: >
+ Truncates the column value to year and month precision, setting day
and time components
+ to their minimum values. The output type matches the input type.
+
+ For example: 2024-07-15 → 2024-07-01
+ For timestamptz and timestamptz_ns, truncation is performed in UTC.
+
+ Applicable to: date, timestamp, timestamptz, timestamp_ns,
timestamptz_ns
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "truncate-to-month"
+
+ Sha256Global:
+ description: |
+ Applies SHA-256 as specified in NIST FIPS 180-4. Deterministic across
all queries
+ and engines — the same input always produces the same output.
+
+ Input-to-bytes encoding by type:
+ - string: UTF-8 encoded bytes
+ - int: 4 bytes, little-endian
+ - long: 8 bytes, little-endian
+ - binary: raw bytes as-is
+
+ Output encoding by type:
+ - string: 64-character lowercase hexadecimal string
+ - int: first 4 bytes of the digest, read as a signed two's complement
little-endian int
+ - long: first 8 bytes of the digest, read as a signed two's complement
little-endian long
+ - binary: the full 32-byte raw SHA-256 digest
+
+ Applicable to: string, int, long, binary
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "sha-256-global"
+
+ Sha256QueryLocal:
+ description: |
+ Applies SHA-256 with a per-query random salt, making the output
non-deterministic
+ across queries while remaining consistent within a single query.
+
+ The engine MUST generate a cryptographically random salt of at least
16 bytes for each query and apply it as:
+ SHA-256(salt_bytes || canonical_bytes)
+ where canonical_bytes follows the same encoding rules as
sha-256-global.
+
+ Output encoding follows the same rules as sha-256-global.
+
+ Applicable to: string, int, long, binary
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "sha-256-query-local"
+
+ ApplyExpression:
+ description: >
+ Replace the field with the result of an expression. Produce the
original field name
+ with the expression result.
+
+ Applicable to: all data types
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ required:
+ - action
+ - expression
+ properties:
+ action:
+ type: string
+ const: "apply-expression"
+ expression:
+ $ref: '#/components/schemas/Expression'
Review Comment:
Implementability gap — `apply-expression` referencing other restricted
columns.
If `col_a`'s action is `apply-expression` over an expression that references
`col_b`, and `col_b` itself has its own action in the same
`required-column-projections`, the spec doesn't say whether the expression sees
raw `col_b` or the transformed `col_b`. Two implementations could diverge here
without violating any rule, which is exactly the kind of ambiguity that
produces interop bugs.
Concrete cases that need a defined answer:
- `col_a` action: `apply-expression` of `length(col_b)`; `col_b` action:
`mask-to-fixed-value`. If the expression sees raw `col_b`, `length(col_b)`
returns the actual string length per row; if it sees masked `col_b`, it always
returns `8` (the length of `"XXXXXXXX"`). Two engines reading the same response
can produce different output.
- `col_a` action: `apply-expression` of `col_b` (i.e., aliasing). With
masking on `col_b`, `col_a` could leak the unmasked value if the expression
sees raw input.
Suggest one of:
- Expressions in `apply-expression` MUST evaluate against raw column values
(consistent with row-filter rule 2).
- Expressions in `apply-expression` MUST NOT reference any column that has
its own entry in `required-column-projections`.
The first is more flexible; the second is simpler to validate.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
+ properties:
+ required-column-projections:
+ description: >
+ A list of columns that require specific actions to be applied when
reading.
+
+ If this property is absent, a reader MAY access all columns of the
table as-is
+ without any mandatory transformations.
+
+ If this property is present, each listed column MUST have its
specified
+ action applied. Columns not listed in required-column-projections
+ are not subject to any read restrictions.
+
+ When this list is present:
+
+ 1. For each column listed in required-column-projections, the
reader MUST apply
+ the specified action before returning values for that column.
+
+ 2. The reader MUST replace all output references to the column
with the result
+ of the action, presenting the result under the original column
name. For
+ example, if the action for column cc is mask-alphanum, the
reader MUST
+ return the masked value as cc in the query output.
+
+ 3. Columns not listed in required-column-projections MAY be
projected normally
+ by the reader without any mandatory transformations.
+
+ 4. A column MUST appear at most once in
required-column-projections.
+
+ 5. If a projected column's action cannot be evaluated by the reader
+ (including unrecognized action types), the reader MUST fail
rather than
+ ignore or skip the action.
+
+ 6. Each action defines the output type for its column. For all
predefined
+ actions except apply-expression, the output type matches the
input column
+ type. For apply-expression, the output type is determined by the
expression.
+
+ type: array
+ items:
+ $ref: '#/components/schemas/Action'
+ required-row-filter:
+ description: >
+ An expression that filters rows in the table that the
authenticated principal does not have access to.
+
+ 1. The expression MUST evaluate to a boolean. A reader MUST
discard any row for which
+ the filter evaluates to FALSE, and no information derived from
discarded rows
+ MAY be included in the query result.
Review Comment:
Three-valued logic still isn't resolved here.
```suggestion
1. The expression MUST evaluate to a boolean. A reader MUST keep
only rows for which
the filter evaluates to TRUE; rows that evaluate to FALSE or
NULL MUST be discarded,
and no information derived from discarded rows MAY be included
in the query result.
```
This matches SQL `WHERE` semantics and removes ambiguity about NULL/UNKNOWN.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
+ properties:
+ required-column-projections:
+ description: >
+ A list of columns that require specific actions to be applied when
reading.
+
+ If this property is absent, a reader MAY access all columns of the
table as-is
+ without any mandatory transformations.
+
+ If this property is present, each listed column MUST have its
specified
+ action applied. Columns not listed in required-column-projections
+ are not subject to any read restrictions.
+
+ When this list is present:
+
+ 1. For each column listed in required-column-projections, the
reader MUST apply
+ the specified action before returning values for that column.
+
+ 2. The reader MUST replace all output references to the column
with the result
+ of the action, presenting the result under the original column
name. For
+ example, if the action for column cc is mask-alphanum, the
reader MUST
+ return the masked value as cc in the query output.
+
+ 3. Columns not listed in required-column-projections MAY be
projected normally
+ by the reader without any mandatory transformations.
+
+ 4. A column MUST appear at most once in
required-column-projections.
+
+ 5. If a projected column's action cannot be evaluated by the reader
+ (including unrecognized action types), the reader MUST fail
rather than
+ ignore or skip the action.
+
+ 6. Each action defines the output type for its column. For all
predefined
+ actions except apply-expression, the output type matches the
input column
+ type. For apply-expression, the output type is determined by the
expression.
+
+ type: array
+ items:
+ $ref: '#/components/schemas/Action'
+ required-row-filter:
+ description: >
+ An expression that filters rows in the table that the
authenticated principal does not have access to.
+
+ 1. The expression MUST evaluate to a boolean. A reader MUST
discard any row for which
+ the filter evaluates to FALSE, and no information derived from
discarded rows
+ MAY be included in the query result.
+
+ 2. Row filters MUST be evaluated against the original,
untransformed column values.
+ Required projections MUST be applied only after row filters are
applied.
+
+ 3. If a client cannot interpret or evaluate a provided filter
expression, it MUST fail.
+
+ 4. If this property is absent, null, or always true then no
mandatory filtering is required.
+ $ref: '#/components/schemas/Expression'
+
+ Action:
+ discriminator:
+ propertyName: action
+ mapping:
+ mask-alphanum: '#/components/schemas/MaskAlphanum'
+ mask-to-fixed-value: '#/components/schemas/MaskToFixedValue'
+ replace-with-null: '#/components/schemas/ReplaceWithNull'
+ show-first-4: '#/components/schemas/ShowFirst4'
+ show-last-4: '#/components/schemas/ShowLast4'
+ truncate-to-year: '#/components/schemas/TruncateToYear'
+ truncate-to-month: '#/components/schemas/TruncateToMonth'
+ sha-256-global: '#/components/schemas/Sha256Global'
+ sha-256-query-local: '#/components/schemas/Sha256QueryLocal'
+ apply-expression: '#/components/schemas/ApplyExpression'
+ type: object
+ required:
+ - action
+ - field-id
+ properties:
+ action:
+ type: string
+ field-id:
+ type: integer
+ description: field id of the column being projected.
+
+ MaskAlphanum:
+ description: >
+ Redacts the column value Unicode code point by code point using the
following rules:
+
+ - Digits (U+0030–U+0039, 0-9) are replaced with 'n'
+ - The following punctuation characters are kept as-is:
+ U+0028 '(' LEFT PARENTHESIS
+ U+0029 ')' RIGHT PARENTHESIS
+ U+002C ',' COMMA
+ U+002E '.' FULL STOP
+ U+002D '-' HYPHEN-MINUS
+ U+0040 '@' COMMERCIAL AT
+ - All other Unicode characters (including letters, whitespace, and any
punctuation
+ not listed above) are replaced with 'x'
+
+ For example: "[email protected]" → "[email protected]"
+
+ Applicable to: string
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "mask-alphanum"
+
+ MaskToFixedValue:
+ description: >
+ Replaces the column value with a predefined type-specific fixed value.
+ Engines MUST use exactly the values listed below to ensure consistency
+ across implementations.
+
+ Fixed values by type:
+ - boolean: false
+ - int: 0
+ - long: 0
+ - float: 0.0
+ - double: 0.0
+ - decimal(p, s): 0 (zero with s digits after the decimal point, e.g.
0.00 for decimal(p,2))
+ - string: "XXXXXXXX"
+ - date: 1970-01-01
+ - time: 00:00:00
+ - timestamp: 1970-01-01T00:00:00
+ - timestamptz: 1970-01-01T00:00:00+00:00
+ - timestamp_ns: 1970-01-01T00:00:00.000000000
+ - timestamptz_ns: 1970-01-01T00:00:00.000000000+00:00
+ - uuid: 00000000-0000-0000-0000-000000000000
+ - fixed(n): n zero bytes
+ - binary: empty byte sequence
+ - variant: {}
+ - geometry: POINT EMPTY
+ - geography: POINT EMPTY
+ - list: empty list []
+ - map: empty map {}
+ - struct: struct with each field set to its type-specific default
(applied recursively)
+
+ Applicable to: all data types
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "mask-to-fixed-value"
+
+ ReplaceWithNull:
+ description: >
+ Replaces the entire column value with NULL.
+
+ Applicable to: all nullable types
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "replace-with-null"
+
+ ShowFirst4:
+ description: >
+ Preserves the first 4 Unicode code points of the column value and
redacts the remainder
+ using mask-alphanum rules (see MaskAlphanum for the exact character
rules).
+ Values with 4 or fewer Unicode code points are returned unchanged.
+
+ For example: "[email protected]" → "[email protected]"
+
+ Applicable to: string
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "show-first-4"
+
+ ShowLast4:
+ description: >
+ Redacts all Unicode code points except the last 4 using mask-alphanum
rules
+ (see MaskAlphanum for the exact character rules).
+ Values with 4 or fewer Unicode code points are returned unchanged.
+
+ For example: "4111-1111-1111-4444" → "nnnn-nnnn-nnnn-4444"
+
+ Applicable to: string
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "show-last-4"
+
+ TruncateToYear:
+ description: >
+ Truncates the column value to year precision, setting month, day, and
time components
+ to their minimum values. The output type matches the input type.
+
+ For example: 2024-07-15 → 2024-01-01
+ For timestamptz and timestamptz_ns, truncation is performed in UTC.
+
+ Applicable to: date, timestamp, timestamptz, timestamp_ns,
timestamptz_ns
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "truncate-to-year"
+
+ TruncateToMonth:
+ description: >
+ Truncates the column value to year and month precision, setting day
and time components
+ to their minimum values. The output type matches the input type.
+
+ For example: 2024-07-15 → 2024-07-01
+ For timestamptz and timestamptz_ns, truncation is performed in UTC.
+
+ Applicable to: date, timestamp, timestamptz, timestamp_ns,
timestamptz_ns
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "truncate-to-month"
+
+ Sha256Global:
+ description: |
+ Applies SHA-256 as specified in NIST FIPS 180-4. Deterministic across
all queries
+ and engines — the same input always produces the same output.
+
+ Input-to-bytes encoding by type:
+ - string: UTF-8 encoded bytes
+ - int: 4 bytes, little-endian
+ - long: 8 bytes, little-endian
+ - binary: raw bytes as-is
+
+ Output encoding by type:
+ - string: 64-character lowercase hexadecimal string
+ - int: first 4 bytes of the digest, read as a signed two's complement
little-endian int
+ - long: first 8 bytes of the digest, read as a signed two's complement
little-endian long
+ - binary: the full 32-byte raw SHA-256 digest
+
+ Applicable to: string, int, long, binary
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "sha-256-global"
+
+ Sha256QueryLocal:
+ description: |
+ Applies SHA-256 with a per-query random salt, making the output
non-deterministic
+ across queries while remaining consistent within a single query.
+
+ The engine MUST generate a cryptographically random salt of at least
16 bytes for each query and apply it as:
+ SHA-256(salt_bytes || canonical_bytes)
+ where canonical_bytes follows the same encoding rules as
sha-256-global.
+
+ Output encoding follows the same rules as sha-256-global.
+
+ Applicable to: string, int, long, binary
+ allOf:
+ - $ref: '#/components/schemas/Action'
+ properties:
+ action:
+ type: string
+ const: "sha-256-query-local"
+
+ ApplyExpression:
Review Comment:
should we add this later until Ryan's expression extension work is done?
right now, the expressions are only boolean.
Compared to the other actions, this description is very thin for what is the
most general-purpose mechanism. A few things worth specifying explicitly:
1. **Output type derivation**: rule 6 of `required-column-projections`
mentions this in passing ("for apply-expression, the output type is determined
by the expression") but the constraint that the output type must be assignable
to the original column's declared type belongs here.
2. **Reference scope**: may the expression reference other columns of the
same row? (Almost certainly yes, but the spec is silent.) May it reference
unrelated tables, sessions, or constants only? Worth pinning down.
3. **NULL-input behavior**: see the comment above on the global NULL rule —
`apply-expression` cannot honor it in general, so this action's description
should say what happens.
4. **Determinism**: is the catalog allowed to return non-deterministic
expressions? `Sha256QueryLocal` makes this explicit; this action does not.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
Review Comment:
This rule covers struct subfields but not list elements or map keys/values.
A list of strings (`list<string>`) where the catalog wants to mask the
elements, or a `map<string, int>` where keys vs. values may be sensitive, falls
outside this. A short note clarifying intent — either "actions only target
named struct fields, never list elements / map entries" or pointing to a
follow-up — would close the gap.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
Review Comment:
`two properties` is unclear here until I read the later part of the schema.
maybe sth like
```
Empty ReadRestrictions object imposes no restrictions and is equivalent to
...
```
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
+ properties:
+ required-column-projections:
+ description: >
+ A list of columns that require specific actions to be applied when
reading.
+
+ If this property is absent, a reader MAY access all columns of the
table as-is
+ without any mandatory transformations.
+
+ If this property is present, each listed column MUST have its
specified
+ action applied. Columns not listed in required-column-projections
+ are not subject to any read restrictions.
+
+ When this list is present:
+
+ 1. For each column listed in required-column-projections, the
reader MUST apply
+ the specified action before returning values for that column.
+
+ 2. The reader MUST replace all output references to the column
with the result
+ of the action, presenting the result under the original column
name. For
+ example, if the action for column cc is mask-alphanum, the
reader MUST
+ return the masked value as cc in the query output.
+
+ 3. Columns not listed in required-column-projections MAY be
projected normally
+ by the reader without any mandatory transformations.
+
+ 4. A column MUST appear at most once in
required-column-projections.
+
+ 5. If a projected column's action cannot be evaluated by the reader
+ (including unrecognized action types), the reader MUST fail
rather than
+ ignore or skip the action.
+
+ 6. Each action defines the output type for its column. For all
predefined
+ actions except apply-expression, the output type matches the
input column
+ type. For apply-expression, the output type is determined by the
expression.
+
+ type: array
+ items:
+ $ref: '#/components/schemas/Action'
+ required-row-filter:
+ description: >
+ An expression that filters rows in the table that the
authenticated principal does not have access to.
+
+ 1. The expression MUST evaluate to a boolean. A reader MUST
discard any row for which
+ the filter evaluates to FALSE, and no information derived from
discarded rows
+ MAY be included in the query result.
+
+ 2. Row filters MUST be evaluated against the original,
untransformed column values.
+ Required projections MUST be applied only after row filters are
applied.
+
+ 3. If a client cannot interpret or evaluate a provided filter
expression, it MUST fail.
+
+ 4. If this property is absent, null, or always true then no
mandatory filtering is required.
+ $ref: '#/components/schemas/Expression'
+
+ Action:
+ discriminator:
+ propertyName: action
+ mapping:
+ mask-alphanum: '#/components/schemas/MaskAlphanum'
+ mask-to-fixed-value: '#/components/schemas/MaskToFixedValue'
+ replace-with-null: '#/components/schemas/ReplaceWithNull'
+ show-first-4: '#/components/schemas/ShowFirst4'
+ show-last-4: '#/components/schemas/ShowLast4'
+ truncate-to-year: '#/components/schemas/TruncateToYear'
+ truncate-to-month: '#/components/schemas/TruncateToMonth'
+ sha-256-global: '#/components/schemas/Sha256Global'
+ sha-256-query-local: '#/components/schemas/Sha256QueryLocal'
+ apply-expression: '#/components/schemas/ApplyExpression'
+ type: object
+ required:
+ - action
+ - field-id
+ properties:
+ action:
+ type: string
+ field-id:
+ type: integer
+ description: field id of the column being projected.
Review Comment:
Style nits: capitalize the first word and use the canonical "ID" rendering —
Iceberg generally writes "ID" rather than "id" in descriptive text:
```suggestion
description: Field ID of the column being projected.
```
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3480,6 +3480,309 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
+
+ These restrictions apply only to the authenticated principal, user,
or account
+ associated with the request. They MUST NOT be interpreted as global
policy and
+ MUST NOT be applied beyond the entity identified by the
Authentication header
+ (or other applicable authentication mechanism).
+
+ If both properties are absent or empty, the ReadRestrictions object
imposes no
+ restrictions and is equivalent to the field being absent from the
response.
+ A server MUST NOT return an action for a column whose type is not
listed in
+ that action's "Applicable to" set.
+ For all actions, if the input column value is NULL, the output MUST
be NULL.
+
+ If a column projection targets a struct-typed field, other column
projections
+ in the same ReadRestrictions MUST NOT target any of that struct's
subfields
+ (at any depth). This avoids ambiguity about which action governs a
given
+ leaf value.
+ properties:
+ required-column-projections:
+ description: >
+ A list of columns that require specific actions to be applied when
reading.
+
+ If this property is absent, a reader MAY access all columns of the
table as-is
+ without any mandatory transformations.
+
+ If this property is present, each listed column MUST have its
specified
+ action applied. Columns not listed in required-column-projections
+ are not subject to any read restrictions.
+
+ When this list is present:
+
+ 1. For each column listed in required-column-projections, the
reader MUST apply
+ the specified action before returning values for that column.
+
+ 2. The reader MUST replace all output references to the column
with the result
+ of the action, presenting the result under the original column
name. For
+ example, if the action for column cc is mask-alphanum, the
reader MUST
+ return the masked value as cc in the query output.
+
+ 3. Columns not listed in required-column-projections MAY be
projected normally
+ by the reader without any mandatory transformations.
+
+ 4. A column MUST appear at most once in
required-column-projections.
+
+ 5. If a projected column's action cannot be evaluated by the reader
+ (including unrecognized action types), the reader MUST fail
rather than
+ ignore or skip the action.
+
+ 6. Each action defines the output type for its column. For all
predefined
+ actions except apply-expression, the output type matches the
input column
+ type. For apply-expression, the output type is determined by the
expression.
+
+ type: array
+ items:
+ $ref: '#/components/schemas/Action'
+ required-row-filter:
+ description: >
+ An expression that filters rows in the table that the
authenticated principal does not have access to.
+
+ 1. The expression MUST evaluate to a boolean. A reader MUST
discard any row for which
+ the filter evaluates to FALSE, and no information derived from
discarded rows
+ MAY be included in the query result.
+
+ 2. Row filters MUST be evaluated against the original,
untransformed column values.
+ Required projections MUST be applied only after row filters are
applied.
+
+ 3. If a client cannot interpret or evaluate a provided filter
expression, it MUST fail.
Review Comment:
Implementability — clarify what "MUST fail" means.
Both projection rule 5 (line 3535 above) and row-filter rule 3 here use
"MUST fail" without defining the failure mode. For a security feature this
needs to be fail-close — refuse the query and return an error to the caller —
never fail-open or fail-silent (return empty results, or skip the action and
return raw values). Worth stating explicitly:
```suggestion
3. If a client cannot interpret or evaluate a provided filter
expression, it MUST fail
the query and return an error to the caller. The client MUST
NOT return any rows
of the table when this happens.
```
A matching clarification on projection rule 5 would close the analogous gap
there.
##########
open-api/rest-catalog-open-api.yaml:
##########
@@ -3265,6 +3265,133 @@ components:
additionalProperties:
type: string
+ ReadRestrictions:
+ type: object
+ description: >
+ Read restrictions for a table, including column projections and row
filter expressions, according to the current schema.
+
+ A client MUST enforce the restrictions defined in this object when
reading data
+ from the table.
Review Comment:
+1 — the spec currently says `A client MUST enforce` without any
qualification, which is unenforceable from the server side. Two paths that
would tighten this:
State explicitly that this is a normative requirement on **trusted**
clients, with a sentence noting that trust establishment (mTLS, on-behalf-of
OAuth, etc.) is out of scope for this spec and is a catalog-implementation
concern. The PR description already takes this position; lifting it into the
spec text would make the assumption visible to readers.
Frame the field as advisory data accompanied by a separate normative
requirement that catalogs MUST NOT include `ReadRestrictions` in responses to
clients they do not trust to enforce them. This could also address the forward
compatibility concern.
An Iceberg client built before this spec change receives a `LoadTableResult`
with an unknown `read-restrictions` field and silently ignores it (the standard
"unknown fields are ignored" REST convention used elsewhere in this spec). On a
security-critical feature, that's fail-open: the catalog has produced
restrictions, the client returns the raw values, and the catalog has no way to
detect this happened. This is a stronger problem — even a fully trusted client
running an older version cannot enforce a contract it doesn't know about.
Two options worth picking between in the spec:
1. **Capability negotiation.** Define a request-side signal — a header
(e.g., `Iceberg-Supported-Capabilities: read-restrictions`) or a query
parameter — that the client uses to advertise support. Catalogs MUST NOT
include `read-restrictions` in responses to clients that haven't advertised
support, and instead MUST return 403 if the principal would have had
restrictions. This puts enforcement on the side that can verify it.
2. **Out-of-band trust.** State explicitly that catalogs MUST NOT return
`read-restrictions` to clients whose enforcement they cannot verify
out-of-band, and that establishing that verification is out of scope. Closer to
the current PR-description framing, but makes the constraint normative.
Without either, every catalog that adopts this is open to fail-open on stale
clients.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]