MartijnVisser commented on PR #24365: URL: https://github.com/apache/flink/pull/24365#issuecomment-2115023158
> @MartijnVisser What do you think should be the behaviour for an empty delimiter? I've taken a tour along the various databases: * Postgres doesn't have a SPLIT function, but leverages `regexp_split_to_table` or `regexp_split_to_array`. When an empty delimiter is provided, it throws an error. https://www.postgresql.org/docs/current/functions-string.html * MySQL doesn't have a SPLIT function either, but has `SUBSTRING_INDEX`. * MSSQL Server has `STRING_SPLIT` https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16 which also doesn't accept empty delimiters * Spark has a `SPLIT` function and accepts empty delimiters, having the same behavior as ksqlDB https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.split.html * Presto has a `SPLIT `function and accepts empty delimiters, same behavior as ksqlDB https://prestodb.io/docs/current/functions/string.html * Clickhouse actually has a wide variety with `splitByChar`, `splitByString`, `splitByRegexp` etc https://clickhouse.com/docs/en/sql-reference/functions/splitting-merging-functions. `splitByString` appears to have the same behavior as Spark and Presto All in all, I could find 3 implementations of `SPLIT` in ksqlDB, Spark, and Presto. All 3 accept empty delimiters. I would think then we should follow the same behavior, since all others have different function names for comparable/similar features -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
