This is an automated email from the ASF dual-hosted git repository.

amoghj pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git


The following commit(s) were added to refs/heads/main by this push:
     new cd707394dd Spec: Document support for binary in truncate transform 
(#10079)
cd707394dd is described below

commit cd707394ddc8cd41ba12bde83ead059716cf5623
Author: Brian Hulette <hulet...@gmail.com>
AuthorDate: Mon Apr 8 14:12:57 2024 -0700

    Spec: Document support for binary in truncate transform (#10079)
---
 format/spec.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/format/spec.md b/format/spec.md
index ab6f349483..aa905e7032 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -314,7 +314,7 @@ Partition field IDs must be reused if an existing partition 
spec contains an equ
 
|-------------------|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-------------|
 | **`identity`**    | Source value, unmodified                                 
    | Any                                                                       
                                | Source type |
 | **`bucket[N]`**   | Hash of value, mod `N` (see below)                       
    | `int`, `long`, `decimal`, `date`, `time`, `timestamp`, `timestamptz`, 
`timestamp_ns`, `timestamptz_ns`, `string`, `uuid`, `fixed`, `binary` | `int`   
    |
-| **`truncate[W]`** | Value truncated to width `W` (see below)                 
    | `int`, `long`, `decimal`, `string`                                        
                                | Source type |
+| **`truncate[W]`** | Value truncated to width `W` (see below)                 
    | `int`, `long`, `decimal`, `string`, `binary`                              
                                | Source type |
 | **`year`**        | Extract a date or timestamp year, as years from 1970     
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int`       |
 | **`month`**       | Extract a date or timestamp month, as months from 
1970-01-01 | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, 
`timestamptz_ns`                                      | `int`       |
 | **`day`**         | Extract a date or timestamp day, as days from 1970-01-01 
    | `date`, `timestamp`, `timestamptz`, `timestamp_ns`, `timestamptz_ns`      
                                | `int`       |
@@ -351,12 +351,14 @@ For hash function details by type, see Appendix B.
 | **`long`**    | `W`, width            | `v - (v % W)`        remainders must 
be positive     [1]                    | `W=10`: `1` → `0`, `-1` → `-10`  |
 | **`decimal`** | `W`, width (no scale) | `scaled_W = decimal(W, scale(v))` `v 
- (v % scaled_W)`               [1, 2] | `W=50`, `s=2`: `10.65` → `10.50` |
 | **`string`**  | `L`, length           | Substring of length `L`: 
`v.substring(0, L)` [3]                    | `L=3`: `iceberg` → `ice`         |
+| **`binary`**  | `L`, length           | Sub array of length `L`: 
`v.subarray(0, L)`  [4]                    | `L=3`: `\x01\x02\x03\x04\x05` → 
`\x01\x02\x03` |
 
 Notes:
 
 1. The remainder, `v % W`, must be positive. For languages where `%` can 
produce negative values, the correct truncate function is: `v - (((v % W) + W) 
% W)`
 2. The width, `W`, used to truncate decimal values is applied using the scale 
of the decimal column to avoid additional (and potentially conflicting) 
parameters.
 3. Strings are truncated to a valid UTF-8 string with no more than `L` code 
points.
+4. In contrast to strings, binary values do not have an assumed encoding and 
are truncated to `L` bytes.
 
 
 #### Partition Evolution

Reply via email to