[ 
https://issues.apache.org/jira/browse/SPARK-57436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junlong Li updated SPARK-57436:
-------------------------------
    Description: 
Currently, Spark's {{instr}} function only supports two arguments and can only 
find the first occurrence of a substring. Users who need to locate a specific 
occurrence or start searching from a certain position must write complex 
workarounds.

In contrast, Oracle, Impala, and Db2 all support a four-argument version of 
{{{}INSTR{}}}, allowing users to specify a starting position and which 
occurrence to locate, with support for both forward and backward search.

Adding the same capability to Spark SQL would:
 * Simplify user queries that currently require complex expressions.

 * Align Spark's behavior with Oracle, Impala, and Db2, reducing migration 
friction.

*Proposed syntax:*
instr(str, substr [, start [, occurrence]])
 
*Proposed behavior:*
 * {{start}} (INT, default 1): 1‑based starting position.

 * 
 ** Positive: search forward from that position.

 * 
 ** Negative: search backward from that position (distance from end of string).

 * 
 ** If {{{}start = 0{}}}, returns 0.

 * {{occurrence}} (INT, default 1): which occurrence to find, in the direction 
determined by the sign of {{{}start{}}}.

 * 
 ** If {{{}occurrence <= 0{}}}, an error is raised.

 * Returns the 1‑based index of the specified occurrence, or 0 if not found.
 * Any argument NULL ⇒ result NULL.

 * Collation awareness is fully preserved.

*Examples:*
{code:java}
instr('abcabc','b', 1, 2) → 5
instr('abcabc','b', -1, 1) → 5
instr('abcabc','b', -1, 2) → 2
instr('abc','b', 0, 1) → 0 
instr('abc','b', 1, 0) → error{code}
*Compatibility:*
The existing two-argument syntax remains unchanged.{*}{{*}}
 

  was:
Currently, Spark's {{instr}} function only supports two arguments and can only 
find the first occurrence of a substring. Users who need to locate a specific 
occurrence or start searching from a certain position must write complex 
workarounds.

In contrast, Oracle, Impala, and Db2 all support a four-argument version of 
{{{}INSTR{}}}, allowing users to specify a starting position and which 
occurrence to locate, with support for both forward and backward search.

Adding the same capability to Spark SQL would:
 * Simplify user queries that currently require complex expressions.

 * Align Spark's behavior with Oracle, Impala, and Db2, reducing migration 
friction.

*Proposed syntax:*
instr(str, substr [, start [, occurrence]])
 
*Proposed behavior:*
 * {{start}} (INT, default 1): 1‑based starting position.

 ** Positive: search forward from that position.

 ** Negative: search backward from that position (distance from end of string).

 ** If {{{}start = 0{}}}, returns 0.

 * {{occurrence}} (INT, default 1): which occurrence to find, in the direction 
determined by the sign of {{{}start{}}}.

 ** If {{{}occurrence <= 0{}}}, an error is raised.

 * Returns the 1‑based index of the specified occurrence, or 0 if not found.
 * Any argument NULL ⇒ result NULL.

 * Collation awareness is fully preserved.

*Examples:*

 
{code:java}

{code}
*SELECT instr('abcabc', 'b', 1, 2);   -- 5
SELECT instr('abcabc', 'b', -1, 1);  -- 5
SELECT instr('abcabc', 'b', -1, 2);  -- 2
SELECT instr('abc', 'b', 0, 1);      -- 0
SELECT instr('abc', 'b', 1, 0);      -- error* *Compatibility:*
The existing two-argument syntax remains unchanged.{*}{*}
 


> Support start and occurrence parameters in instr function
> ---------------------------------------------------------
>
>                 Key: SPARK-57436
>                 URL: https://issues.apache.org/jira/browse/SPARK-57436
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Junlong Li
>            Priority: Minor
>
> Currently, Spark's {{instr}} function only supports two arguments and can 
> only find the first occurrence of a substring. Users who need to locate a 
> specific occurrence or start searching from a certain position must write 
> complex workarounds.
> In contrast, Oracle, Impala, and Db2 all support a four-argument version of 
> {{{}INSTR{}}}, allowing users to specify a starting position and which 
> occurrence to locate, with support for both forward and backward search.
> Adding the same capability to Spark SQL would:
>  * Simplify user queries that currently require complex expressions.
>  * Align Spark's behavior with Oracle, Impala, and Db2, reducing migration 
> friction.
> *Proposed syntax:*
> instr(str, substr [, start [, occurrence]])
>  
> *Proposed behavior:*
>  * {{start}} (INT, default 1): 1‑based starting position.
>  * 
>  ** Positive: search forward from that position.
>  * 
>  ** Negative: search backward from that position (distance from end of 
> string).
>  * 
>  ** If {{{}start = 0{}}}, returns 0.
>  * {{occurrence}} (INT, default 1): which occurrence to find, in the 
> direction determined by the sign of {{{}start{}}}.
>  * 
>  ** If {{{}occurrence <= 0{}}}, an error is raised.
>  * Returns the 1‑based index of the specified occurrence, or 0 if not found.
>  * Any argument NULL ⇒ result NULL.
>  * Collation awareness is fully preserved.
> *Examples:*
> {code:java}
> instr('abcabc','b', 1, 2) → 5
> instr('abcabc','b', -1, 1) → 5
> instr('abcabc','b', -1, 2) → 2
> instr('abc','b', 0, 1) → 0 
> instr('abc','b', 1, 0) → error{code}
> *Compatibility:*
> The existing two-argument syntax remains unchanged.{*}{{*}}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to