This is an automated email from the ASF dual-hosted git repository.
amoghj pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new cd707394dd Spec: Document support for binary in truncate transform
(#10079)
cd707394dd is described below
commit cd707394ddc8cd41ba12bde83ead059716cf5623
Author: Brian Hulette <[email protected]>
AuthorDate: Mon Apr 8 14:12:57 2024 -0700
Spec: Document support for binary in truncate transform (#10079)
---
format/spec.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/format/spec.md b/format/spec.md
index ab6f349483..aa905e7032 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition
spec contains an equ
|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------|
| **`identity`** | Source value, unmodified
| Any
| Source type |
| **`bucket[N]`** | Hash of value, mod `N` (see below)
| `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`,
`timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int`
|
-| **`truncate[W]`** | Value truncated to width `W` (see below)
| `int`, `long`, `decimal`, `string`
| Source type |
+| **`truncate[W]`** | Value truncated to width `W` (see below)
| `int`, `long`, `decimal`, `string`, `binary`
| Source type |
| **`year`** | Extract a date or timestamp year, as years from 1970
| `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`
| `int` |
| **`month`** | Extract a date or timestamp month, as months from
1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`,
`timestamptz_ns` | `int` |
| **`day`** | Extract a date or timestamp day, as days from 1970-01-01
| `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`
| `int` |
@@ -351,12 +351,14 @@ For hash function details by type, see Appendix B.
| **`long`** | `W`, width | `v - (v % W)` remainders must
be positive [1] | `W=10`: `1` → `0`, `-1` → `-10` |
| **`decimal`** | `W`, width (no scale) | `scaled_W = decimal(W, scale(v))` `v
- (v % scaled_W)` [1, 2] | `W=50`, `s=2`: `10.65` → `10.50` |
| **`string`** | `L`, length | Substring of length `L`:
`v.substring(0, L)` [3] | `L=3`: `iceberg` → `ice` |
+| **`binary`** | `L`, length | Sub array of length `L`:
`v.subarray(0, L)` [4] | `L=3`: `\x01\x02\x03\x04\x05` →
`\x01\x02\x03` |
Notes:
1. The remainder, `v % W`, must be positive. For languages where `%` can
produce negative values, the correct truncate function is: `v - (((v % W) + W)
% W)`
2. The width, `W`, used to truncate decimal values is applied using the scale
of the decimal column to avoid additional (and potentially conflicting)
parameters.
3. Strings are truncated to a valid UTF-8 string with no more than `L` code
points.
+4. In contrast to strings, binary values do not have an assumed encoding and
are truncated to `L` bytes.
#### Partition Evolution