MartijnVisser commented on code in PR #22951:
URL: https://github.com/apache/flink/pull/22951#discussion_r1489314997
##########
flink-table/flink-table-common/src/main/java/org/apache/flink/table/functions/BuiltInFunctionDefinitions.java:
##########
@@ -231,6 +232,21 @@ ANY, and(logical(LogicalTypeRoot.BOOLEAN), LITERAL)
"org.apache.flink.table.runtime.functions.scalar.ArrayContainsFunction")
.build();
+ public static final BuiltInFunctionDefinition ARRAY_SORT =
+ BuiltInFunctionDefinition.newBuilder()
+ .name("ARRAY_SORT")
+ .kind(SCALAR)
+ .inputTypeStrategy(
+ or(
+ sequence(new
ArrayComparableElementArgumentTypeStrategy()),
+ sequence(
+ new
ArrayComparableElementArgumentTypeStrategy(),
+ logical(LogicalTypeRoot.BOOLEAN))))
Review Comment:
This is quite a rabbit hole 😆
1. Databricks has two functions:
* sort_array
https://docs.databricks.com/en/sql/language-manual/functions/sort_array.html -
Sorts ascending, puts NULL at beginning.
* array_sort
https://docs.databricks.com/en/sql/language-manual/functions/array_sort.html -
Sorts ascending, puts NULL at the end.
2. Spark has
https://spark.apache.org/docs/latest/api/sql/index.html#sort_array which is
indeed not different from the Databricks one. Sorts ascending, puts NULL at
beginning
3. Snowflake indeed has
https://docs.snowflake.com/en/sql-reference/functions/array_sort - Sorts
ascending, puts NULL at ~~the end~~ beginning.
4. Trino, that has `array_sort(x)` - Sorts ascending, puts NULL at the end.
5. StarRocks also has this function
https://docs.starrocks.io/docs/2.2/sql-reference/sql-functions/array-functions/array_sort/
- Sorts ascending, puts NULL at beginning
6. Doris has it
https://doris.apache.org/docs/1.2/sql-manual/sql-functions/array-functions/array_sort/
- Sorts ascending, puts NULL at beginning
7. Hive uses NULLS FIRST as the default behavior in ascending ordering mode,
per https://issues.apache.org/jira/browse/HIVE-12994 - Sorts ascending, puts
NULL at beginning.
I do think that the Snowflake has the cleanest function signature: there's
just one function, and it has two optional arguments to give all the
flexibility.
I think the default sorting mode should be ascending, and then it boils down
to decide on the default location on where to put NULL. Out of the 7
implementations, we have 6 who put NULL at the beginning, and only Databricks
puts them at the end. Naturally, I think we should put NULL at the beginning
but I'm +0 on the default location of NULL.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]