Gustavo de Morais created FLINK-39601:
-----------------------------------------
Summary: Add UTF-8 validation utilities and
StringData.fromUtf8Bytes connector API
Key: FLINK-39601
URL: https://issues.apache.org/jira/browse/FLINK-39601
Project: Flink
Issue Type: Sub-task
Components: Table SQL / API
Reporter: Gustavo de Morais
Assignee: Gustavo de Morais
Foundation for FLIP-568. Adds the validator and the {{@PublicEvolving}}
connector API, no SQL surface yet.
* {{StringUtf8Utils}} - new {{firstInvalidUtf8ByteIndex(byte[], int, int)}}
validator (Flink-style branches reusing the existing {{decodeUTF8Strict}}
algorithm).
* {{EncodingUtils.isValidUtf8(byte[])}} / {{(byte[], int, int)}} - thin
null-tolerant predicate delegating to the validator.
* {{BinaryStringData.fromUtf8Bytes(byte[])}} / {{(byte[], int, int)}} -
validating factory; returns {{null}} on null input (matches Spark's
{{{}UTF8String.fromBytes{}}}), throws {{IllegalArgumentException}} with the
byte index on invalid UTF-8.
* {{StringData.fromUtf8Bytes}} - public-evolving interface methods that
delegate.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)