[ 
https://issues.apache.org/jira/browse/FLINK-39600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther closed FLINK-39600.
--------------------------------
    Fix Version/s: 2.4.0
       Resolution: Fixed

> FLIP-568: Strict BYTES-to-STRING CAST with UTF-8 Validation Utilities
> ---------------------------------------------------------------------
>
>                 Key: FLINK-39600
>                 URL: https://issues.apache.org/jira/browse/FLINK-39600
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>            Reporter: Gustavo de Morais
>            Assignee: Gustavo de Morais
>            Priority: Critical
>             Fix For: 2.4.0
>
>
> {{CAST(bytes AS STRING)}} today silently replaces invalid UTF-8 with the 
> Unicode replacement character {{{}U+FFFD{}}}. The substitution is 
> irreversible and produces no warning - pipelines keep running while data is 
> permanently corrupted downstream. This also blocks engine optimizations that 
> need injective guarantees (e.g. upsert key propagation through {{{}BINARY -> 
> STRING{}}}).
> [FLIP-568|https://cwiki.apache.org/confluence/display/FLINK/FLIP-568%3A+Strict+BYTES-to-STRING+CAST+with+UTF-8+Validation+Utilities]
>  addresses this by:
>  # Making {{CAST(bytes AS STRING)}} throw on invalid UTF-8. {{TRY_CAST}} 
> returns {{{}NULL{}}}. A migration flag restores the legacy behavior.
>  # Adding two SQL functions:
>  ** {{IS_VALID_UTF8(bytes) -> BOOLEAN}} for routing invalid records to a 
> dead-letter sink
>  ** {{MAKE_VALID_UTF8(bytes) -> STRING}} as the explicit, opt-in substitution 
> recipe
>  # Adding {{StringData.fromUtf8Bytes(byte[])}} connector API that validates 
> at ingestion and throws on invalid input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to