MartijnVisser commented on PR #24365:
URL: https://github.com/apache/flink/pull/24365#issuecomment-2115023158

   > @MartijnVisser What do you think should be the behaviour for an empty 
delimiter?
   
   I've taken a tour along the various databases:
   * Postgres doesn't have a SPLIT function, but leverages 
`regexp_split_to_table` or `regexp_split_to_array`. When an empty delimiter is 
provided, it throws an error. 
https://www.postgresql.org/docs/current/functions-string.html 
   * MySQL doesn't have a SPLIT function either, but has `SUBSTRING_INDEX`. 
   * MSSQL Server has `STRING_SPLIT` 
https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16
 which also doesn't accept empty delimiters
   * Spark has a `SPLIT` function and accepts empty delimiters, having the same 
behavior as ksqlDB 
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.split.html
   * Presto has a `SPLIT `function and accepts empty delimiters, same behavior 
as ksqlDB https://prestodb.io/docs/current/functions/string.html
   * Clickhouse actually has a wide variety with `splitByChar`, 
`splitByString`, `splitByRegexp` etc 
https://clickhouse.com/docs/en/sql-reference/functions/splitting-merging-functions.
 `splitByString` appears to have the same behavior as Spark and Presto
   
   All in all, I could find 3 implementations of `SPLIT` in ksqlDB, Spark, and 
Presto. All 3 accept empty delimiters. I would think then we should follow the 
same behavior, since all others have different function names for 
comparable/similar features


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to