featzhang created FLINK-38822:
---------------------------------
Summary: Extend URL_DECODE with a recursive parameter for
multi-level decoding
Key: FLINK-38822
URL: https://issues.apache.org/jira/browse/FLINK-38822
Project: Flink
Issue Type: Improvement
Components: Table SQL / API
Affects Versions: 2.0-preview
Reporter: featzhang
h3. Background
Apache Flink SQL currently provides the built-in function
{{{}URL_DECODE(str){}}}, which decodes a string in
{{application/x-www-form-urlencoded}} format. This function was introduced in
FLINK-34108 and is aligned with the corresponding functionality in Apache
Calcite.
However, in real-world data processing scenarios, especially in log processing,
tracking data, and external system integrations, it is common to encounter
{*}multi-level URL-encoded strings{*}, for example:
* Values that are URL-encoded multiple times by upstream systems
* Nested encoding caused by redirects, proxies, or intermediate transformations
In such cases, calling {{URL_DECODE}} only once is insufficient.
h3. Problem
The current {{URL_DECODE(str)}} function only performs {*}a single decoding
pass{*}. Users who need to fully decode multi-level encoded values must
repeatedly apply the function manually, which:
* Reduces readability of SQL queries
* Makes the decoding intent less explicit
* Is inconvenient and error-prone in complex SQL pipelines
h3. Proposal
Extend the existing {{URL_DECODE}} function with an {*}optional boolean
parameter {{recursive}}{*}, which controls whether decoding should be applied
repeatedly until the value can no longer be decoded.
Proposed function signatures:
{{URL_DECODE(str)
URL_DECODE(str, recursive)}}
Where:
* {{recursive = false}} (default): preserves the current behavior (single-pass
decoding)
* {{{}recursive = true{}}}: repeatedly applies URL decoding until the result
no longer changes
h3. Examples
{{-- Single-pass decoding (current behavior)
SELECT URL_DECODE('%252Fpath%252Fto%252Fresource');
-- Result: '%2Fpath%2Fto%2Fresource'
-- Recursive decoding
SELECT URL_DECODE('%252Fpath%252Fto%252Fresource', true);
-- Result: '/path/to/resource'}}
h3. Compatibility
* This change is *fully backward-compatible*
* Existing queries using {{URL_DECODE(str)}} will continue to work without any
behavior changes
* The new parameter is optional and only extends functionality
h3. Additional Notes
* The recursive decoding should stop when the decoded result is identical to
the previous value
* This proposal builds directly on the existing implementation introduced in
FLINK-34108
* Similar behavior is commonly required in data cleansing and normalization
use cases
--
This message was sent by Atlassian Jira
(v8.20.10#820010)