[ 
https://issues.apache.org/jira/browse/FLINK-38822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046478#comment-18046478
 ] 

featzhang commented on FLINK-38822:
-----------------------------------

I would like to work on this issue. 
Could someone please assign it to me? Thanks.

> Extend URL_DECODE with a recursive parameter for multi-level decoding
> ---------------------------------------------------------------------
>
>                 Key: FLINK-38822
>                 URL: https://issues.apache.org/jira/browse/FLINK-38822
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API
>    Affects Versions: 2.0-preview
>            Reporter: featzhang
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> h3. Background
> Apache Flink SQL currently provides the built-in function 
> {{{}URL_DECODE(str){}}}, which decodes a string in 
> {{application/x-www-form-urlencoded}} format. This function was introduced in 
> FLINK-34108 and is aligned with the corresponding functionality in Apache 
> Calcite.
> However, in real-world data processing scenarios, especially in log 
> processing, tracking data, and external system integrations, it is common to 
> encounter {*}multi-level URL-encoded strings{*}, for example:
>  * Values that are URL-encoded multiple times by upstream systems
>  * Nested encoding caused by redirects, proxies, or intermediate 
> transformations
> In such cases, calling {{URL_DECODE}} only once is insufficient.
> h3. Problem
> The current {{URL_DECODE(str)}} function only performs {*}a single decoding 
> pass{*}. Users who need to fully decode multi-level encoded values must 
> repeatedly apply the function manually, which:
>  * Reduces readability of SQL queries
>  * Makes the decoding intent less explicit
>  * Is inconvenient and error-prone in complex SQL pipelines
> h3. Proposal
> Extend the existing {{URL_DECODE}} function with an {*}optional boolean 
> parameter {{recursive}}{*}, which controls whether decoding should be applied 
> repeatedly until the value can no longer be decoded.
> Proposed function signatures:
>  
> {{URL_DECODE(str)
> URL_DECODE(str, recursive)}}
> Where:
>  * {{recursive = false}} (default): preserves the current behavior 
> (single-pass decoding)
>  * {{{}recursive = true{}}}: repeatedly applies URL decoding until the result 
> no longer changes
> h3. Examples
>  
> {{-- Single-pass decoding (current behavior)
> SELECT URL_DECODE('%252Fpath%252Fto%252Fresource');
> -- Result: '%2Fpath%2Fto%2Fresource'
> -- Recursive decoding
> SELECT URL_DECODE('%252Fpath%252Fto%252Fresource', true);
> -- Result: '/path/to/resource'}}
> h3. Compatibility
>  * This change is *fully backward-compatible*
>  * Existing queries using {{URL_DECODE(str)}} will continue to work without 
> any behavior changes
>  * The new parameter is optional and only extends functionality
> h3. Additional Notes
>  * The recursive decoding should stop when the decoded result is identical to 
> the previous value
>  * This proposal builds directly on the existing implementation introduced in 
> FLINK-34108
>  * Similar behavior is commonly required in data cleansing and normalization 
> use cases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to