[
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-47353.
---------------------------------
Fix Version/s: 4.0.0
Resolution: Fixed
Issue resolved by pull request 46597
[https://github.com/apache/spark/pull/46597]
> Mode expression for strings (all collations)
> --------------------------------------------
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Uroš Bojanić
> Assignee: Gideon P
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Enable collation support for the *Mode* expression in Spark. First confirm
> what is the expected behaviour for this expression when given collated
> strings, then move on to the implementation that would enable handling
> strings of all collation types. Implement the corresponding unit tests and
> E2E SQL tests to reflect how this function should be used with collation in
> SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment
> with the existing functions to learn more about how they work. In addition,
> look into the possible use-cases and implementation of similar functions
> within other other open-source DBMS, such as
> [PostgreSQL|https://www.postgresql.org/docs/].
>
> The goal for this Jira ticket is to implement the *Mode* expression so it
> supports all collation types currently supported in Spark. To understand what
> changes were introduced in order to enable full collation support for other
> existing functions in Spark, take a look at the Spark PRs and Jira tickets
> for completed tasks in this parent (for example: Contains, StartsWith,
> EndsWith).
> Examples:
> With UTF8_BINARY collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’)
> AS tab(col);
> should return 'a'.
> With UTF8_BINARY_LCASE collation, the query
> SELECT mode(col) FROM VALUES (‘a’), (‘a’), (‘a’), (‘B’), (‘B’), (‘b’), (‘b’)
> AS tab(col);
> should return either 'B' or 'b'.
>
> Read more about ICU [Collation Concepts|http://example.com/] and
> [Collator|http://example.com/] class. Also, refer to the Unicode Technical
> Standard for
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]