[ 
https://issues.apache.org/jira/browse/SPARK-49968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-49968:
--------------------------------------

    Assignee: Apache Spark

> The split function produces incorrect results with an empty regex and a limit
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-49968
>                 URL: https://issues.apache.org/jira/browse/SPARK-49968
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.1
>            Reporter: Dejiu Lu
>            Assignee: Apache Spark
>            Priority: Major
>              Labels: pull-request-available
>
> The current behavior of the split function is as follows:
> {code:java}
> select split('hello', 'h', 1) // result is ["hello"]
> select split('hello', '-', 1) // result is ["hello"]
> select split('hello', '', 1)  // result is ["h"]
> select split('1A2A3A4', 'A', 3) // result is ["1","2","3A4"]
> select split('1A2A3A4', '', 3)  // result is ["1","A","2"]{code}
> However, according to the function's description, when the limit is greater 
> than zero, the last element of the split result should contain the remaining 
> part of the input string.
> {code:java}
> Arguments:
>       * str - a string expression to split.
>       * regex - a string representing a regular expression. The regex string 
> should be a Java regular expression.
>       * limit - an integer expression which controls the number of times the 
> regex is applied.
>           * limit > 0: The resulting array's length will not be more than 
> `limit`, and the resulting array's last entry will contain all input beyond 
> the last matched regex.
>           * limit <= 0: `regex` will be applied as many times as possible, 
> and the resulting array can be of any size. {code}
> So, the split function produces incorrect results with an empty regex and a 
> limit. The correct result should be: 
> {code:java}
> select split('hello', '', 1)    // result is ["hello"]
> select split('1A2A3A4', '', 3)  // result is ["1","A","2A3A4"]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to