lriggs opened a new pull request, #50137:
URL: https://github.com/apache/arrow/pull/50137

   ### Rationale for this change
   `CHR(n)` only worked for ASCII (0–127). Values ≥ 128 emitted a single raw 
byte
   (invalid UTF‑8), causing "Error during planning". Goal: emit the proper
   multi‑byte **UTF‑8 encoding** of the Unicode code point, consistent with
   PostgreSQL/Snowflake.
   
   ### What changes are included in this PR?
   #### Arrow (C++ / Gandiva)
   
   | File | Change |
   |------|--------|
   | `cpp/src/gandiva/precompiled/string_ops.cc` | `chr_int64` rewritten to 
UTF‑8‑encode the code point (1–4 bytes) and error on invalid input (negative, > 
0x10FFFF, surrogate range 0xD800–0xDFFF). `chr_int32` now delegates to it. |
   | `cpp/src/gandiva/precompiled/string_ops_test.cc` | `TestChrBigInt` 
rewritten for UTF‑8 semantics: every byte‑length boundary (1/2/3/4‑byte, 
low+high), í/€/日/😀, and the three invalid‑input error cases. |
   
   ### Are these changes tested?
   Yes, unit tests.
   
   ### Are there any user-facing changes?
   Yes, the CHR gandiva function now supports unicode characters.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to