kbendick commented on code in PR #4937:
URL: https://github.com/apache/iceberg/pull/4937#discussion_r887423432


##########
format/spec.md:
##########
@@ -343,12 +343,13 @@ For hash function details by type, see Appendix B.
 | **`int`**     | `W`, width            | `v - (v % W)`        remainders must 
be positive     [1]                    | `W=10`: `1` → `0`, `-1` → `-10`  |
 | **`long`**    | `W`, width            | `v - (v % W)`        remainders must 
be positive     [1]                    | `W=10`: `1` → `0`, `-1` → `-10`  |
 | **`decimal`** | `W`, width (no scale) | `scaled_W = decimal(W, scale(v))` `v 
- (v % scaled_W)`               [1, 2] | `W=50`, `s=2`: `10.65` → `10.50` |
-| **`string`**  | `L`, length           | Substring of length `L`: 
`v.substring(0, L)`                     | `L=3`: `iceberg` → `ice`         |
+| **`string`**  | `L`, length           | Substring of length `L`: 
`v.substring(0, L)` [3]                    | `L=3`: `iceberg` → `ice`         |
 
 Notes:
 
 1. The remainder, `v % W`, must be positive. For languages where `%` can 
produce negative values, the correct truncate function is: `v - (((v % W) + W) 
% W)`
 2. The width, `W`, used to truncate decimal values is applied using the scale 
of the decimal column to avoid additional (and potentially conflicting) 
parameters.
+3. For strings truncation is based on on unicode code point length (not byte 
length).

Review Comment:
   Or possibly reusing the language from here?
   
   
https://github.com/apache/iceberg/blob/5009949ba4377ac5a8572ff7ae70e886c9e33bec/api/src/main/java/org/apache/iceberg/util/UnicodeUtil.java#L38-L42



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to