[
https://issues.apache.org/jira/browse/SPARK-49968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot reassigned SPARK-49968:
--------------------------------------
Assignee: (was: Apache Spark)
> The split function produces incorrect results with an empty regex and a limit
> -----------------------------------------------------------------------------
>
> Key: SPARK-49968
> URL: https://issues.apache.org/jira/browse/SPARK-49968
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.1
> Reporter: Dejiu Lu
> Priority: Major
> Labels: pull-request-available
>
> The current behavior of the split function is as follows:
> {code:java}
> select split('hello', 'h', 1) // result is ["hello"]
> select split('hello', '-', 1) // result is ["hello"]
> select split('hello', '', 1) // result is ["h"]
> select split('1A2A3A4', 'A', 3) // result is ["1","2","3A4"]
> select split('1A2A3A4', '', 3) // result is ["1","A","2"]{code}
> However, according to the function's description, when the limit is greater
> than zero, the last element of the split result should contain the remaining
> part of the input string.
> {code:java}
> Arguments:
> * str - a string expression to split.
> * regex - a string representing a regular expression. The regex string
> should be a Java regular expression.
> * limit - an integer expression which controls the number of times the
> regex is applied.
> * limit > 0: The resulting array's length will not be more than
> `limit`, and the resulting array's last entry will contain all input beyond
> the last matched regex.
> * limit <= 0: `regex` will be applied as many times as possible,
> and the resulting array can be of any size. {code}
> So, the split function produces incorrect results with an empty regex and a
> limit. The correct result should be:
> {code:java}
> select split('hello', '', 1) // result is ["hello"]
> select split('1A2A3A4', '', 3) // result is ["1","A","2A3A4"]{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]