This is an automated email from the ASF dual-hosted git repository. amoghj pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push: new cd707394dd Spec: Document support for binary in truncate transform (#10079) cd707394dd is described below commit cd707394ddc8cd41ba12bde83ead059716cf5623 Author: Brian Hulette <hulet...@gmail.com> AuthorDate: Mon Apr 8 14:12:57 2024 -0700 Spec: Document support for binary in truncate transform (#10079) --- format/spec.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index ab6f349483..aa905e7032 100644 --- a/format/spec.md +++ b/format/spec.md @@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition spec contains an equ |-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------| | **`identity`** | Source value, unmodified | Any | Source type | | **`bucket[N]`** | Hash of value, mod `N` (see below) | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int` | -| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string` | Source type | +| **`truncate[W]`** | Value truncated to width `W` (see below) | `int`, `long`, `decimal`, `string`, `binary` | Source type | | **`year`** | Extract a date or timestamp year, as years from 1970 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | | **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | | **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns` | `int` | @@ -351,12 +351,14 @@ For hash function details by type, see Appendix B. | **`long`** | `W`, width | `v - (v % W)` remainders must be positive [1] | `W=10`: `1` → `0`, `-1` → `-10` | | **`decimal`** | `W`, width (no scale) | `scaled_W = decimal(W, scale(v))` `v - (v % scaled_W)` [1, 2] | `W=50`, `s=2`: `10.65` → `10.50` | | **`string`** | `L`, length | Substring of length `L`: `v.substring(0, L)` [3] | `L=3`: `iceberg` → `ice` | +| **`binary`** | `L`, length | Sub array of length `L`: `v.subarray(0, L)` [4] | `L=3`: `\x01\x02\x03\x04\x05` → `\x01\x02\x03` | Notes: 1. The remainder, `v % W`, must be positive. For languages where `%` can produce negative values, the correct truncate function is: `v - (((v % W) + W) % W)` 2. The width, `W`, used to truncate decimal values is applied using the scale of the decimal column to avoid additional (and potentially conflicting) parameters. 3. Strings are truncated to a valid UTF-8 string with no more than `L` code points. +4. In contrast to strings, binary values do not have an assumed encoding and are truncated to `L` bytes. #### Partition Evolution